yigress opened a new pull request, #4979:
URL: https://github.com/apache/hive/pull/4979

   ### What changes were proposed in this pull request?
   clean up the code for readibilty
   fix the issue when parent partition path exists for multi partitioned 
dynamic insert.
   
   
   
   ### Why are the changes needed?
   fix bug:
   if a table have multiple partitions (part1=x1, part2=y1), when insert into a 
new partition(part1=x1, part2=y2) hcatalog FileOutputCommitterContainer throws 
path part1=x1 already exists error. This is due to the path checking stops at 
parent level, 
   
   pig -useHcatalog
   
   A = load 'source' using org.apache.hive.hcatalog.pig.HCatLoader();
   B = filter A by (part2 == 'y1');
   
   // following succeeds
   store B into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
   
   //following fails with duplicate publishing error
   
   C = filter A by (part2 == 'y2');
   store C into 'target' USING org.apache.hive.hcatalog.pig.HCatStorer();
   
   
   ```
   Partition already present with given partition key values : Data already 
exists in /user/hive/warehouse/target/part1=x1, duplicate publish not possible.
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:243)
   at 
org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler$EventProcessor.handleJobCommit(CommitterEventHandler.java:286)
    
   Caused by: org.apache.hive.hcatalog.common.HCatException : 2002 : Partition 
already present with given partition key values : Data already exists in 
/user/hive/warehouse/target/part1=x1, duplicate publish not possible.
   at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.moveTaskOutputs(FileOutputCommitterContainer.java:564)
   at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.registerPartitions(FileOutputCommitterContainer.java:949)
   at 
org.apache.hive.hcatalog.mapreduce.FileOutputCommitterContainer.commitJob(FileOutputCommitterContainer.java:273)
   at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter.commitJob(PigOutputCommitter.java:241)
   ```
   
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   ### Is the change a dependency upgrade?
   No
   
   
   ### How was this patch tested?
   updated unit test to include the use case that is affected by the bug
   mvn clean test 
-Dtest=TestHCatExternalDynamicPartitioned,TestHCatDynamicPartitioned,TestHCatPartitioned,TestHCatNonPartitioned
   
   also tested locally with pig -useHCatalog
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to