[
https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797093#comment-13797093
]
Venkatesh Seetharam commented on FALCON-129:
--------------------------------------------
Thanks a ton [~sriksun] for taking time to review this humongous patch.
bq. 2. Possibly incorrect checkstyle warning supression
Good catch.
bq. 3. Process involving table storage shouldn't be considered for late handling
Very good catch. Thanks!
bq. 4. FeedCleanupHandler, uses the FileStatus array for deletion.
Will do. Missing in the abstract handler as well, will add the check in delete
method.
bq. 5. Would it help to have test cases added to FeedEvictor for catalog
storage type.
The tests are covered in int-tests since mocking static CatalogService is hard.
org.apache.falcon.catalog.TableStorageFeedEvictorIT - covers both managed and
external tables.
bq. 6. From FeedEntityParser code it looks like feed entities with late arrival
section is rejected,
Parse is called but not validate in common module. All validations that
requires services are in int-tests. Hence this is not caught. Will definitely
change the entity.
bq. 7. Any specific reason to comment out this in oozie-workflow-0.3.xsd
Good question. I had to add the any namespace for hive actions in replication
and that had a conflict with another any for sla. Hence I commented the sla out
as we are not using this in falcon and it is too specific to Yahoo! and GMS.
{code}<xs:any namespace="##other" minOccurs="1" maxOccurs="1"/>{code}
There are ways to override it:
* with specific bindings in jaxb but I thought it was unnecessary anyways
* having java actions instead of hive for import and export - we should do this
in future so its portable across oozie
bq. This is indeed a very complex feature and patch is very clean and changes
are fairly intuitive.
Thanks! :-)
Plan to upload the cumulative patch in this jira.
> Disable Late data handling for hive tables
> ------------------------------------------
>
> Key: FALCON-129
> URL: https://issues.apache.org/jira/browse/FALCON-129
> Project: Falcon
> Issue Type: Sub-task
> Affects Versions: 0.3
> Reporter: Venkatesh Seetharam
> Assignee: Venkatesh Seetharam
> Attachments: FALCON-129.patch, FALCON-129-r1.patch
>
>
> HCat nor Hive APIs expose internal stats about a given partition. The only
> way to get the partition size is to get the location of the partition on HDFS
> and then use globStatus and contentSummary APIs.
--
This message was sent by Atlassian JIRA
(v6.1#6144)