[ 
https://issues.apache.org/jira/browse/PIG-2578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438929#comment-13438929
 ] 

Raghu Angadi commented on PIG-2578:
-----------------------------------

Thanks for the analysis Rohini. +1 for reverting this patch. 

For the larger issue, I think Pig should clearly define the contract for 
job/conf passed setLocation() and setStoreLocation() so the user's StoreFunc 
can be implemented properly. I would suggest resisting the temptation to say 
"this method might be called any number of times" (a variant of this appears 
multiple places in Pig interface). While this made UDF implementors think twice 
about what they are doing, it allowed Pig to implement work arounds rather than 
proper fixes (i.e. why is "setStoreLocation()" called so many places?).

                
> Multiple Store-commands mess up mapred.output.dir.
> --------------------------------------------------
>
>                 Key: PIG-2578
>                 URL: https://issues.apache.org/jira/browse/PIG-2578
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.8.1, 0.9.2
>            Reporter: Mithun Radhakrishnan
>            Assignee: Daniel Dai
>             Fix For: 0.10.0, 0.11
>
>         Attachments: PIG-2578-1.patch
>
>
> When one runs a pig-script with multiple storers, one sees the following:
> 1. When run as a script, Pig launches a single job.
> 2. PigOutputCommitter::setupJob() calls the 
> underlyingOutputCommitter::setupJob(), once for each storer. But the 
> mapred.output.dir is the same for both calls, even though the storers write 
> to different locations. 
> This was originally seen in HCATALOG-276, when HCatalog's end-to-end tests 
> are run against Pig.
> (https://issues.apache.org/jira/browse/HCATALOG-276)
> Sample pig-script (near identical to HCatalog's Pig_Checkin_4 test):
> a = load 'keyvals' using org.apache.hcatalog.pig.HCatLoader();
> split a into b if key<200, c if key >=200;
> store b into 'keyvals_lt200' using org.apache.hcatalog.pig.HCatStorer();
> store c into 'keyvals_ge200' using org.apache.hcatalog.pig.HCatStorer();
> I've suggested a workaround in HCat for the time being, but I think this 
> might be something that needs fixing in Pig.
> Thanks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to