[
https://issues.apache.org/jira/browse/HCATALOG-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13192924#comment-13192924
]
Mithun Radhakrishnan commented on HCATALOG-232:
-----------------------------------------------
The bad output-directory is being created in
RCFileMapReduceOutputFormat::getRecordWriter(). Obviously, this is being called
prematurely (i.e. even before the dynamic partition-values have been computed.)
Here's what's changed on this front, between 0.2 and 0.3:
1. In 0.2, HCatOutputFormat::getRecordWriter() would simply return an
HCatRecordWriter instance. HCatRecordWriter's constructor would not call
RCFileMapRedOutputFormat::getRecordWriter() in the dyn-partitions case.
Instead, this is postponed till write() is called.
2. In 0.3, HCatOutputFormat::getRecordWriter() delegates to the underlying
RCFileMapReduceOutputFormat::getRecordWriter(), even when the Map-task is just
being set up. Since the dynamic-part-vals haven't been computed yet, a new
directory (with part-vals set to HIVE_DEFAULT_PARTITION) is created. This
messes up how the output-directory is walked and the dynamic-partitions are
registered.
The key is to postpone the call to
RCFileMapReduceOutputFormat::getRecordWriter(), until the dyn-par-vals are good
and ready.
I'll post a patch for this shortly.
> Dynamic Partitioning broken: keys set to HIVE_DEFAULT_PARTITION
> ---------------------------------------------------------------
>
> Key: HCATALOG-232
> URL: https://issues.apache.org/jira/browse/HCATALOG-232
> Project: HCatalog
> Issue Type: Bug
> Components: mapreduce
> Affects Versions: 0.3
> Reporter: Mithun Radhakrishnan
>
> Looks like dynamic-partitioning is broken with 0.3. This is a regression from
> 0.2.
> Consider 2 tables (say source_search_Uberwald and target_search_Uberwald)
> with identical schemas and the following 4 partition keys:
> datestamp string None
> srcid string None
> action string None
> testid string None
> Consider the following Pig script (run on 0.9.2)
> a = load 'source_search_Uberwald' using org.apache.hcatalog.pig.HCatLoader();
> b = filter a by ( datestamp == '20091102' );
> store b into 'target_search_Uberwald' using
> org.apache.hcatalog.pig.HCatStorer('srcid=191740');
> One would expect that the target table would now have partitions
> corresponding to each partition in the source table (where the srcid is as
> specified).
> What one sees, however, is one of two symptoms:
> 1. In addition to expected partitions on target, there's at least one more
> partition, with all dynamic-part-vals set to "HIVE_DEFAULT_PARTITION", or
> 2. No new partitions on target table. However, the target table's directory
> has subdirectories named "action=__HIVE_DEFAULT_PARTITION__", etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira