[
https://issues.apache.org/jira/browse/PIG-2124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13050051#comment-13050051
]
Daniel Dai commented on PIG-2124:
---------------------------------
This is related to ColumnMapKeyPrune optimization. If we disable this rule
using "-t ColumnMapKeyPrune", the error goes away.
> Script never ending when joining from the same source
> -----------------------------------------------------
>
> Key: PIG-2124
> URL: https://issues.apache.org/jira/browse/PIG-2124
> Project: Pig
> Issue Type: Bug
> Affects Versions: 0.8.1
> Reporter: Tristan Croiset
> Assignee: Daniel Dai
>
> Considering the following script, it works perfectly fine or the script never
> ends depending on the fields used at output.
> input ("scores" file) contains:
> ------------------
> test1;0.1
> test2;0.9
> test1;0.3
> ------------------
> ------------------------------------------------------------------------------
> score_list = LOAD 'scores' USING PigStorage(';')
> AS (word: chararray, score: double);
> score_list_ = FOREACH score_list GENERATE
> word,
> score,
> 0 AS joinField;
> group_score = GROUP score_list ALL;
> sum_score = FOREACH group_score GENERATE
> 0 AS joinField,
> SUM(score_list.score) as scoreTotal;
> score_with_sum = JOIN score_list_ BY joinField, sum_score BY joinField;
> out = FOREACH score_with_sum GENERATE word, (score / scoreTotal);
> DUMP out;
> ------------------------------------------------------------------------------
> This works fine
> But if I change "out" to : out = FOREACH score_with_sum GENERATE word;
> Then the script never ends and the output keeps repeating lines likes:
> 2011-06-15 15:00:22,536 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 24
> 2011-06-15 15:00:22,889 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:22,889 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 65535810; bufend = 68157240; bufvoid = 99614720
> 2011-06-15 15:00:22,889 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 327661; kvend = 262124; length = 327680
> 2011-06-15 15:00:22,994 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 25
> 2011-06-15 15:00:23,345 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:23,345 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 68157240; bufend = 70778670; bufvoid = 99614720
> 2011-06-15 15:00:23,345 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 262124; kvend = 196587; length = 327680
> 2011-06-15 15:00:23,447 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 26
> 2011-06-15 15:00:23,794 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:23,794 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 70778670; bufend = 73400100; bufvoid = 99614720
> 2011-06-15 15:00:23,794 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 196587; kvend = 131050; length = 327680
> 2011-06-15 15:00:23,896 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 27
> 2011-06-15 15:00:24,243 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:24,243 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 73400100; bufend = 76021530; bufvoid = 99614720
> 2011-06-15 15:00:24,243 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 131050; kvend = 65513; length = 327680
> 2011-06-15 15:00:24,346 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 28
> 2011-06-15 15:00:24,692 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:24,692 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 76021530; bufend = 78642970; bufvoid = 99614720
> 2011-06-15 15:00:24,693 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 65513; kvend = 327657; length = 327680
> 2011-06-15 15:00:24,793 [SpillThread] INFO org.apache.hadoop.mapred.MapTask
> - Finished spill 29
> 2011-06-15 15:00:25,144 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> Spilling map output: record full = true
> 2011-06-15 15:00:25,144 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> bufstart = 78642970; bufend = 81264400; bufvoid = 99614720
> 2011-06-15 15:00:25,144 [Thread-13] INFO org.apache.hadoop.mapred.MapTask -
> kvstart = 327657; kvend = 262120; length = 327680
> P.S. I know it's possible to refactor the script using casting to scalar ;)
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira