[jira] [Commented] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Arup Malakar (JIRA) Thu, 06 Sep 2012 16:25:09 -0700

    [ 
https://issues.apache.org/jira/browse/HCATALOG-490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13450146#comment-13450146
 ]


Arup Malakar commented on HCATALOG-490:
---------------------------------------

Travis, I was already working on this. If you don't mind and haven't started 
can I finish this?
                
> HCatStorer()  throws error when the same partition key is present in records 
> in more than one  tasks running as part of the same job
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HCATALOG-490
>                 URL: https://issues.apache.org/jira/browse/HCATALOG-490
>             Project: HCatalog
>          Issue Type: Bug
>            Reporter: Arup Malakar
>            Assignee: Travis Crawford
>
> I have a file with ~240MB data. One of the columns in input data was 'action' 
> and the value is either 1 or 2. 
> When I try to load it using the following script:
> {code}
> in = load '/user/malakar/page_views_20000000_0/part-00000' USING 
> PigStorage(',') AS (user:chararray, timespent:int, query_term:chararray, 
> ip_addr:int, estimated_revenue:int, page_info:chararray, action:int);
> STORE in into 'page_views_20000000_0' USING 
> org.apache.hcatalog.pig.HCatStorer();
> {code}
> It throws the following exception:
> {quote}
> org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory 
> hdfs://tasktrackerhost:8020/user/hive/warehouse/page_views_20000000_0/_DYN0.7622108853605496/action=1
>  already exists at 
> org.apache.hadoop.mapred.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:121)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:200)
>  at 
> org.apache.hcatalog.mapreduce.FileRecordWriterContainer.write(FileRecordWriterContainer.java:52)
>  at org.apache.hcatalog.pig.HCatBaseStorer.putNext(HCatBaseStorer.java:235) 
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>  at 
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>  at 
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>  at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>  at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at 
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at 
> org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at 
> org.apache.hadoop.mapred.Child$4.run(Child.java:255) at 
> java.security.AccessController.doPrivileged(Native Method) at 
> javax.security.auth.Subject.doAs(Subject.java:396) at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
>  at org.apache.hadoop.mapred.Child.main(Child.java:249) 
> {quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HCATALOG-490) HCatStorer() throws error when the same partition key is present in records in more than one tasks running as part of the same job

Reply via email to