[
https://issues.apache.org/jira/browse/FALCON-129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796494#comment-13796494
]
Srikanth Sundarrajan edited comment on FALCON-129 at 10/16/13 2:17 PM:
-----------------------------------------------------------------------
Few minor nits.
Am trying to load up the following patches as a single one to help me review
them holistically once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that
order). Here are some observations:
# -CatalogPartition class is not included in the patch.-
# Possibly incorrect checkstyle warning supression (RetryHandler,
AbstractRerunHandler & LateRerunHandler). Visibility doesn't seem to be
modified. Number of arguments seems to increased instead.
{code}
//SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
public abstract void handleRerun(String cluster, String entityType,
String entityName, String nominalTime,
String runId, String wfId,
long msgReceivedTime, String
feedStorageType);
//RESUME CHECKSTYLE CHECK VisibilityModifierCheck
{code}
# Process involving table storage shouldn't be considered for late handling as
the same is not implemented. ProcessEntityParser should include this in
validations.
{code}
validateLateInputs(process);
{code}
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good
to check for null return value here.
{code}
FileStatus[] paths = fs.globStatus(stagingPath);
delete(cluster, feed, retention, paths);
{code}
# Would it help to have test cases added to FeedEvictor for catalog storage
type. Looks like the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival
section is rejected, but sample config used in tests seem to contain in
common/src/test/resources/config/feed/hive-table-feed.xml. Is there a gap ?
Should FeedEntityParsesTest::testParseFeedWithTable pass?
{code}
private void validateLateData(Feed feed) throws FalconException {
if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
&& feed.getLateArrival() != null) {
throw new ValidationException("Late data handling is not supported
for feeds with table storage! "
+ feed.getName());
}
}
{code}
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
{code}
<!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>-->
{code}
This is indeed a very complex feature and patch is very clean and changes are
fairly intuitive.
was (Author: sriksun):
Few minor nits.
Am trying to load up the following patches as a single one to help me review
them holistically once (FALCON-94, FALCON-93, FALCON-90 & FALCON-129 in that
order). Here are some observations:
# CatalogPartition class is not included in the patch.
# Possibly incorrect checkstyle warning supression (RetryHandler,
AbstractRerunHandler & LateRerunHandler). Visibility doesn't seem to be
modified. Number of arguments seems to increased instead.
{code}
//SUSPEND CHECKSTYLE CHECK VisibilityModifierCheck
public abstract void handleRerun(String cluster, String entityType,
String entityName, String nominalTime,
String runId, String wfId,
long msgReceivedTime, String
feedStorageType);
//RESUME CHECKSTYLE CHECK VisibilityModifierCheck
{code}
# Process involving table storage shouldn't be considered for late handling as
the same is not implemented. ProcessEntityParser should include this in
validations.
{code}
validateLateInputs(process);
{code}
# FeedCleanupHandler, uses the FileStatus array for deletion. It might be good
to check for null return value here.
{code}
FileStatus[] paths = fs.globStatus(stagingPath);
delete(cluster, feed, retention, paths);
{code}
# Would it help to have test cases added to FeedEvictor for catalog storage
type. Looks like the test cases are for FS type.
# From FeedEntityParser code it looks like feed entities with late arrival
section is rejected, but sample config used in tests seem to contain in
common/src/test/resources/config/feed/hive-table-feed.xml. Is there a gap ?
Should FeedEntityParsesTest::testParseFeedWithTable pass?
{code}
private void validateLateData(Feed feed) throws FalconException {
if (FeedHelper.getStorageType(feed) == Storage.TYPE.TABLE
&& feed.getLateArrival() != null) {
throw new ValidationException("Late data handling is not supported
for feeds with table storage! "
+ feed.getName());
}
}
{code}
# Any specific reason to comment out this in oozie-workflow-0.3.xsd
{code}
<!--<xs:any namespace="uri:oozie:sla:0.1" minOccurs="0"
maxOccurs="1"/>-->
{code}
This is indeed a very complex feature and patch is very clean and changes are
fairly intuitive.
> Disable Late data handling for hive tables
> ------------------------------------------
>
> Key: FALCON-129
> URL: https://issues.apache.org/jira/browse/FALCON-129
> Project: Falcon
> Issue Type: Sub-task
> Affects Versions: 0.3
> Reporter: Venkatesh Seetharam
> Assignee: Venkatesh Seetharam
> Attachments: FALCON-129.patch, FALCON-129-r1.patch
>
>
> HCat nor Hive APIs expose internal stats about a given partition. The only
> way to get the partition size is to get the location of the partition on HDFS
> and then use globStatus and contentSummary APIs.
--
This message was sent by Atlassian JIRA
(v6.1#6144)