[
https://issues.apache.org/jira/browse/PIG-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Cheolsoo Park updated PIG-3813:
-------------------------------
Attachment: PIG-3813-1.patch
Attaching a patch. Here is what I found-
# PushUpFilter moves up the filter by (pWeek) to above the foreach (gTWeek).
# Since pWeek is a direct predecessor of the rank (pWeekRanked), the
PushUpFilter changes the output schema of pWeekRanked from rank_pWeek (uid:18)
to rank_gTWeek (uid:24).
# After this, ColumnPruneVistor visits the 2nd foreach (gpWeekRanked). It
checks whether the column $0 exists in the input schema of gpWeekRanked, which
is equivalent to the output schema of pWeekRanked as follows-
{code}
if (!inputUids.contains(project.getFieldSchema().uid))
innerLoadsToRemove.add(innerLoad);
{code}
The problem is that rank_pWeek (uid:18) is no longer in the input schema of
gpWeekRanked, but rank_gTWeek (uid:24) is. So ColumnPruneVistor removes the
column $0 from gpWeekRanked.
# Now gpWeekRanked generates empty tuples.
The attached patch changes the condition in ColumnPruneVistor as follows-
{code}
LogicalSchema.LogicalFieldSchema fs =
project.findReferent().getSchema().getField(project.getColNum());
if (!inputUids.contains(fs.uid)) {
innerLoadsToRemove.add(innerLoad);
}
{code}
Basically, it uses the schema of the referent instead of that of the field
itself in LOGenerate. This way, any changes in its predecessor are reflected
correctly.
I have never worked on this area of code. Please let me know if this fix is not
proper.
> PigGenericMapBase runPipeline() method returns empty tuples and goes into
> infinite loop under certain conditions
> ------------------------------------------------------------------------------------------------------------------
>
> Key: PIG-3813
> URL: https://issues.apache.org/jira/browse/PIG-3813
> Project: Pig
> Issue Type: Bug
> Components: impl
> Affects Versions: 0.12.0
> Reporter: Suhas Satish
> Assignee: Cheolsoo Park
> Priority: Critical
> Attachments: PIG-3813-1.patch, test_data.txt
>
>
> When the following script is run, pig goes into an infinite loop. This was
> reproduced on pig trunk as of March 12, 2014 on apache hadoop 1.2.
> test_data.txt has been attached.
> test.pig
> tWeek = LOAD '/tmp/test_data.txt' USING PigStorage ('|') AS (WEEK:int,
> DESCRIPTION:chararray, END_DATE:chararray, PERIOD:int);
> gTWeek = FOREACH tWeek GENERATE WEEK AS WEEK, PERIOD AS PERIOD;
> pWeek = FILTER gTWeek BY PERIOD == 201312;
> pWeekRanked = RANK pWeek BY WEEK ASC DENSE;
> gpWeekRanked = FOREACH pWeekRanked GENERATE $0;
> store gpWeekRanked into 'gpWeekRanked';
> describe gpWeekRanked;
> ---------------------------------------------------
> The res object of class Result, gets its value from leaf.getNextTuple()
> This gets an empty tuple
> ()
> with STATUS_OK.
> SO the while(true) condition never gets an End of Processing (EOP) and so
> does not exit.
>
--
This message was sent by Atlassian JIRA
(v6.2#6252)