[ 
https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797093#comment-13797093
 ] 

Venkatesh Seetharam commented on FALCON-129:
--------------------------------------------

Thanks a ton [~sriksun] for taking time to review this humongous patch.

bq. 2. Possibly incorrect checkstyle warning supression
Good catch. 

bq. 3. Process involving table storage shouldn't be considered for late handling
Very good catch. Thanks!

bq. 4. FeedCleanupHandler, uses the FileStatus array for deletion.
Will do. Missing in the abstract handler as well, will add the check in delete 
method.

bq. 5. Would it help to have test cases added to FeedEvictor for catalog 
storage type.
The tests are covered in int-tests since mocking static CatalogService is hard. 
org.apache.falcon.catalog.TableStorageFeedEvictorIT - covers both managed and 
external tables.

bq. 6. From FeedEntityParser code it looks like feed entities with late arrival 
section is rejected,
Parse is called but not validate in common module. All validations that 
requires services are in int-tests. Hence this is not caught. Will definitely 
change the entity.

bq. 7. Any specific reason to comment out this in oozie-workflow-0.3.xsd
Good question. I had to add the any namespace for hive actions in replication 
and that had a conflict with another any for sla. Hence I commented the sla out 
as we are not using this in falcon and it is too specific to Yahoo! and GMS.
{code}<xs:any namespace="##other" minOccurs="1" maxOccurs="1"/>{code}
There are ways to override it:
* with specific bindings in jaxb but I thought it was unnecessary anyways
* having java actions instead of hive for import and export - we should do this 
in future so its portable across oozie

bq. This is indeed a very complex feature and patch is very clean and changes 
are fairly intuitive.
Thanks! :-)

Plan to upload the cumulative patch in this jira. 

> Disable Late data handling for hive tables
> ------------------------------------------
>
>                 Key: FALCON-129
>                 URL: https://issues.apache.org/jira/browse/FALCON-129
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-129.patch, FALCON-129-r1.patch
>
>
> HCat nor Hive APIs expose internal stats about a given partition. The only 
> way to get the partition size is to get the location of the partition on HDFS 
> and then use globStatus and contentSummary APIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to