[ 
https://issues.apache.org/jira/browse/PIG-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434548#comment-13434548
 ] 

Rohini Palaniswamy commented on PIG-2578:
-----------------------------------------

Did some debugging with and without PIG-2578. Multiple storage using PigStorage 
worked fine in both cases. This is because before every getOutputFormat call, 
there is a setLocation with a copy of JobContext or TaskAttemptContext and that 
copy was passed to getOutputCommitter(), getRecordWriter() or 
checkOutputSpecs() calls. So the output format actually runs with the correct 
configuration. So multiple store commands don't always get messed up. The 
corner case problem I see is that, if one instance of the store set a 
configuration to a specific value and another instance of the store does not 
set any value at all for that config it will still get the config with the 
value set from the copy of the job put by the first instance(without PIG-2578).

The actual problem was with the hcat code when this jira was filed. It set the 
mapred.output.dir and lot of other properties in front end but not in the 
backened during setStoreLocation. 
http://svn.apache.org/viewvc/incubator/hcatalog/branches/branch-0.4/src/java/org/apache/hcatalog/pig/HCatStorer.java?revision=1325867&view=markup
If it had set the mapred.output.dir in the backend also, it would have worked 
fine. It was later fixed to do so.
                
> Multiple Store-commands mess up mapred.output.dir.
> --------------------------------------------------
>
>                 Key: PIG-2578
>                 URL: https://issues.apache.org/jira/browse/PIG-2578
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.2
>            Reporter: Mithun Radhakrishnan
>            Assignee: Daniel Dai
>             Fix For: 0.10.0, 0.11
>
>         Attachments: PIG-2578-1.patch
>
>
> When one runs a pig-script with multiple storers, one sees the following:
> 1. When run as a script, Pig launches a single job.
> 2. PigOutputCommitter::setupJob() calls the 
> underlyingOutputCommitter::setupJob(), once for each storer. But the 
> mapred.output.dir is the same for both calls, even though the storers write 
> to different locations. 
> This was originally seen in HCATALOG-276, when HCatalog's end-to-end tests 
> are run against Pig.
> (https://issues.apache.org/jira/browse/HCATALOG-276)
> Sample pig-script (near identical to HCatalog's Pig_Checkin_4 test):
> a = load 'keyvals' using org.apache.hcatalog.pig.HCatLoader();
> split a into b if key<200, c if key >=200;
> store b into 'keyvals_lt200' using org.apache.hcatalog.pig.HCatStorer();
> store c into 'keyvals_ge200' using org.apache.hcatalog.pig.HCatStorer();
> I've suggested a workaround in HCat for the time being, but I think this 
> might be something that needs fixing in Pig.
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to