[ 
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheolsoo Park updated PIG-3813:
-------------------------------

    Attachment: PIG-3813-1.patch

Attaching a patch. Here is what I found-

# PushUpFilter moves up the filter by (pWeek) to above the foreach (gTWeek).
# Since pWeek is a direct predecessor of the rank (pWeekRanked), the 
PushUpFilter changes the output schema of pWeekRanked from rank_pWeek (uid:18) 
to rank_gTWeek (uid:24).
# After this, ColumnPruneVistor visits the 2nd foreach (gpWeekRanked). It 
checks whether the column $0 exists in the input schema of gpWeekRanked, which 
is equivalent to the output schema of pWeekRanked as follows-
{code}
if (!inputUids.contains(project.getFieldSchema().uid))
    innerLoadsToRemove.add(innerLoad);
{code}
The problem is that rank_pWeek (uid:18) is no longer in the input schema of 
gpWeekRanked, but rank_gTWeek (uid:24) is. So ColumnPruneVistor removes the 
column $0 from gpWeekRanked.
# Now gpWeekRanked generates empty tuples.

The attached patch changes the condition in ColumnPruneVistor as follows-
{code}
LogicalSchema.LogicalFieldSchema fs = 
project.findReferent().getSchema().getField(project.getColNum());
if (!inputUids.contains(fs.uid)) {
    innerLoadsToRemove.add(innerLoad);
}
{code}
Basically, it uses the schema of the referent instead of that of the field 
itself in LOGenerate. This way, any changes in its predecessor are reflected 
correctly.

I have never worked on this area of code. Please let me know if this fix is not 
proper.

>  PigGenericMapBase  runPipeline() method returns empty tuples and goes into 
> infinite loop under certain conditions
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: PIG-3813
>                 URL: https://issues.apache.org/jira/browse/PIG-3813
>             Project: Pig
>          Issue Type: Bug
>          Components: impl
>    Affects Versions: 0.12.0
>            Reporter: Suhas Satish
>            Assignee: Cheolsoo Park
>            Priority: Critical
>         Attachments: PIG-3813-1.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was 
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2. 
> test_data.txt has been attached. 
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int, 
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---------------------------------------------------
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple 
> () 
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so 
> does not exit. 
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to