[ 
https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796494#comment-13796494
 ] 

Srikanth Sundarrajan edited comment on FALCON-129 at 10/16/13 2:17 PM:
-----------------------------------------------------------------------

Few minor nits. 

Am trying to load up the following patches as a single one to help me review 
them holistically once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that 
order). Here are some observations:

# -CatalogPartition class is not included in the patch.-
# Possibly incorrect checkstyle warning supression (RetryHandler, 
AbstractRerunHandler & LateRerunHandler). Visibility doesn't seem to be 
modified. Number of arguments seems to increased instead.
{code}
    //SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
    public abstract void handleRerun(String cluster, String entityType,
                                     String entityName, String nominalTime, 
String runId, String wfId,
                                     long msgReceivedTime, String 
feedStorageType);
    //RESUME CHECKSTYLE CHECK VisibilityModifierCheck
{code} 
# Process involving table storage shouldn't be considered for late handling as 
the same is not implemented. ProcessEntityParser should include this in 
validations.
{code}
        validateLateInputs(process);
{code}
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good 
to check for null return value here.
{code}
            FileStatus[] paths = fs.globStatus(stagingPath);
            delete(cluster, feed, retention, paths);
{code}
# Would it help to have test cases added to FeedEvictor for catalog storage 
type. Looks like the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival 
section is rejected, but sample config used in tests seem to contain in 
common/src/test/resources/config/feed/hive-table-feed.xml. Is there a gap ? 
Should FeedEntityParsesTest::testParseFeedWithTable pass?
{code}
    private void validateLateData(Feed feed) throws FalconException {
        if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
                && feed.getLateArrival() != null) {
            throw new ValidationException("Late data handling is not supported 
for feeds with table storage! "
                    + feed.getName());
        }
    }
{code}
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
{code}
            <!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0" 
maxOccurs="1"/>-->
{code}

This is indeed a very complex feature and patch is very clean and changes are 
fairly intuitive. 


was (Author: sriksun):
Few minor nits. 

Am trying to load up the following patches as a single one to help me review 
them holistically once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that 
order). Here are some observations:

# CatalogPartition class is not included in the patch.
# Possibly incorrect checkstyle warning supression (RetryHandler, 
AbstractRerunHandler & LateRerunHandler). Visibility doesn't seem to be 
modified. Number of arguments seems to increased instead.
{code}
    //SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
    public abstract void handleRerun(String cluster, String entityType,
                                     String entityName, String nominalTime, 
String runId, String wfId,
                                     long msgReceivedTime, String 
feedStorageType);
    //RESUME CHECKSTYLE CHECK VisibilityModifierCheck
{code} 
# Process involving table storage shouldn't be considered for late handling as 
the same is not implemented. ProcessEntityParser should include this in 
validations.
{code}
        validateLateInputs(process);
{code}
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good 
to check for null return value here.
{code}
            FileStatus[] paths = fs.globStatus(stagingPath);
            delete(cluster, feed, retention, paths);
{code}
# Would it help to have test cases added to FeedEvictor for catalog storage 
type. Looks like the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival 
section is rejected, but sample config used in tests seem to contain in 
common/src/test/resources/config/feed/hive-table-feed.xml. Is there a gap ? 
Should FeedEntityParsesTest::testParseFeedWithTable pass?
{code}
    private void validateLateData(Feed feed) throws FalconException {
        if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
                && feed.getLateArrival() != null) {
            throw new ValidationException("Late data handling is not supported 
for feeds with table storage! "
                    + feed.getName());
        }
    }
{code}
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
{code}
            <!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0" 
maxOccurs="1"/>-->
{code}

This is indeed a very complex feature and patch is very clean and changes are 
fairly intuitive. 

> Disable Late data handling for hive tables
> ------------------------------------------
>
>                 Key: FALCON-129
>                 URL: https://issues.apache.org/jira/browse/FALCON-129
>             Project: Falcon
>          Issue Type: Sub-task
>    Affects Versions: 0.3
>            Reporter: Venkatesh Seetharam
>            Assignee: Venkatesh Seetharam
>         Attachments: FALCON-129.patch, FALCON-129-r1.patch
>
>
> HCat nor Hive APIs expose internal stats about a given partition. The only 
> way to get the partition size is to get the location of the partition on HDFS 
> and then use globStatus and contentSummary APIs.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to