[
https://issues.apache.org/jira/browse/HCATALOG-451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13425431#comment-13425431
]
Vandana Ayyalasomayajula commented on HCATALOG-451:
---------------------------------------------------
[~rohini], Francis and me had a discussion on how to fix this jira. Pig-0.8 (
used in our unit tests) does not override both abortJob() and commitJob() and
even Pig 0.9 and 0.10 do not implement abortJob method. So if we remove the
code from the cleanupJob method, then all the unit tests will fail. Also, there
are users who would still want to use HCatalog 0.4 with older versions of Pig.
We thought the following will be good to have two different patches for trunk
and branch 0.4. Branch 0.4 will be compatible with versions previous or equal
to Pig-0.10 and the trunk will be compatible with Pig-0.10 and higher. Also,
for the trunk the pig -0.8 unit test dependency needs to be updated to higher
revision.
Branch:
abortJob(){
abortJobInternal();
}
commitJob() {
registerPartitions();
}
cleanupJob(){
abortJobInternal();
}
Trunk:
abortJob(){
abortJobInternal();
}
commitJob() {
registerPartitions();
}
cleanupJob(){
// Empty or throw exception as this should be no longer be called.
}
> Partitions are created even when Jobs are aborted
> -------------------------------------------------
>
> Key: HCATALOG-451
> URL: https://issues.apache.org/jira/browse/HCATALOG-451
> Project: HCatalog
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.4, 0.5
> Environment: Hadoop 1.0.2, non-dynamic partitions.
> Reporter: Mithun Radhakrishnan
> Assignee: Vandana Ayyalasomayajula
> Fix For: 0.5, 0.4.1
>
> Attachments: HCATALOG-451.0.patch
>
>
> If an MR job using HCatOutputFormat fails, and
> FileOutputCommitterContainer::abortJob() is called, one would expect that
> partitions aren't created/registered with HCatalog.
> When using dynamic-partitions, one sees that this behaves correctly. But when
> static-partitions are used, partitions are created regardless of whether the
> Job succeeded or failed.
> (This manifested as a failure when the job is repeated. The retry-job fails
> to launch since the partitions already exist from the last failed run.)
> This is a result of bad code in FileOutputCommitter::cleanupJob(), which
> seems to do an unconditional partition-add. This can be fixed by adding a
> check for the output directory before adding partitions (in the
> !dynamicParititoning case), since the directory is removed in abortJob().
> We'll have a patch for this shortly. As an aside, we ought to move the
> partition-creation into commitJob(), where it logically belongs. cleanupJob()
> is deprecated and common to both success and failure code paths.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira