[
https://issues.apache.org/jira/browse/PIG-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Olga Natkovich reassigned PIG-1724:
-----------------------------------
Assignee: Richard Ding
> Multiquery optimization miscalculates the parallelism and results in extra 0
> bytes files (Pig 0.7 and 0.8)
> ----------------------------------------------------------------------------------------------------------
>
> Key: PIG-1724
> URL: https://issues.apache.org/jira/browse/PIG-1724
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.7.0, 0.8.0
> Reporter: Viraj Bhat
> Assignee: Richard Ding
> Fix For: 0.7.0, 0.8.0
>
> Attachments: samplepig001.in
>
>
> We have found an issue with Pig 0.8 and Pig 0.7 when using Multiquery
> optimization. It produces more number of part files than required. Please
> observe that the GROUP ALL is a dummy in this case.
> {code}
> record002 = LOAD 'samplepig001.in' AS (id:chararray,num:int);
> f_records002= FILTER record002 BY num!=50000;
> group01 = GROUP f_records002 ALL PARALLEL 1;
> STORE group01 INTO 'pig_out_direc_SET1';
> set2 = FILTER f_records002 BY num!=200002;
> set2_Group = GROUP set2 ALL PARALLEL 1;
> STORE set2 INTO 'pig_out_direc_SET2';
> set3 = FILTER f_records002 BY num!=100001;
> set3_Group= GROUP set3 BY id PARALLEL 40;
> --set3_Rec4= FILTER set3_Group by num!=5000000;
> STORE set3_Group INTO 'pig_out_direc_SET3';
> {code}
> When run in Pig 0.8 it results in the following output.
> {quote}
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET1
> ...
> Found 40 items
> rw------- 3 viraj users 0 2010-11-13 02:09
> /user/viraj/pig_out_direc_SET1/part-r-00000
> ...
> ...
> -rw------- 3 viraj users 0 2010-11-13 02:09
> /user/viraj/pig_out_direc_SET1/part-r-00039
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET2
> Found 1 items
> -rw------- 3 viraj users 110 2010-11-13 02:08
> /user/viraj/pig_out_direc_SET2/part-m-00000
> $ hadoop fs -ls /user/viraj/pig_out_direc_SET3
> Found 40 items
> -rw------- 3 viraj users 0 2010-11-13 02:09
> /user/viraj/pig_out_direc_SET3/part-r-00000
> ...
> ...
> -rw------- 3 viraj users 0 2010-11-13 02:09
> /user/viraj/pig_out_direc_SET3/part-r-00039
> {quote}
> Viraj
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.