[ 
https://issues.apache.org/jira/browse/HCATALOG-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450146#comment-13450146
 ] 

Arup Malakar commented on HCATALOG-490:
---------------------------------------

Travis, I was already working on this. If you don't mind and haven't started 
can I finish this?
                
> HCatStorer()  throws error when the same partition key is present in records 
> in more than one  tasks running as part of the same job
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HCATALOG-490
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-490
>             Project: HCatalog
>          Issue Type: Bug
>            Reporter: Arup Malakar
>            Assignee: Travis Crawford
>
> I have a file with ~240MB data. One of the columns in input data was 'action' 
> and the value is either 1 or 2. 
> When I try to load it using the following script:
> {code}
> in = load '/user/malakar/page_views_20000000_0/part-00000' USING 
> PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray, 
> ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);
> STORE in into 'page_views_20000000_0' USING 
> org.apache.hcatalog.pig.HCatStorer();
> {code}
> It throws the following exception:
> {quote}
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
>  already exists at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
>  at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) 
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>  at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>  at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  at org.apache.hadoop.mapred.Child.main(Child.java:249) 
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to