[ https://issues.apache.org/jira/browse/PIG-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13498166#comment-13498166 ]
Koji Noguchi commented on PIG-3051: ----------------------------------- Sorry, After ColumnMapKeyPrune, I pasted the wrong one. Here's the one after Pruning. {noformat} U1: (Name: LOStore Schema: sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870, 1871, 1872]ColumnPrune:OutputUids=[1870, 1871, 1872] | |---U1: (Name: LOForEach Schema: sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870]ColumnPrune:OutputUids=[1870, 1871, 1872] | | | (Name: LOGenerate[false,false,false] Schema: sortCol#1871:int,label#1872:chararray,cnt#1870:long)ColumnPrune:InputUids=[1870]ColumnPrune:OutputUids=[1870, 1871, 1872] | | | | | (Name: Constant Type: int Uid: 1871) | | | | | (Name: Constant Type: chararray Uid: 1872) | | | | | cnt:(Name: Project Type: long Uid: 1870 Input: 0 Column: (*)) | | | |---(Name: LOInnerLoad[0] Schema: cnt#1870:long) | |---(Name: LOSort Schema: cnt#1870:long)ColumnPrune:InputUids=[1865, 1870]ColumnPrune:OutputUids=[1870] | | *****HERE***** | cnt:(Name: Project Type: long Uid: 1865 Input: 0 Column: ***2***) | |---G4: (Name: LOSplitOutput Schema: cnt#1870:long)ColumnPrune:InputUids=[1865]ColumnPrune:OutputUids=[1870] | | | (Name: Constant Type: boolean Uid: 1867) | |---(Name: LOForEach Schema: cnt#1865:long) | | | (Name: LOGenerate[false] Schema: cnt#1865:long) | | | | | cnt:(Name: Project Type: long Uid: 1865 Input: 0 Column: (*)) | | | |---(Name: LOInnerLoad[2] Schema: cnt#1865:long) | |---G4: (Name: LOSplit Schema: sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1864, 1865, 1857]ColumnPrune:OutputUids=[1864, 1865, 1857] | |---G4: (Name: LOSort Schema: sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1864, 1865, 1857]ColumnPrune:OutputUids=[1864, 1865, 1857] | | | cnt:(Name: Project Type: long Uid: 1865 Input: 0 Column: 2) | |---G3: (Name: LOForEach Schema: sortCol#1864:int,label#1857:chararray,cnt#1865:long)ColumnPrune:InputUids=[1857, 1862]ColumnPrune:OutputUids=[1864, 1865, 1857] {noformat} So I believe the new LOSort introduced by the LimitOptimizer has the projection pointing to the previous LOSOrt which breaks when columns are pruned and column index is not being updated. > java.lang.IndexOutOfBoundsException failure with LimitOptimizer + > ColumnPruning > -------------------------------------------------------------------------------- > > Key: PIG-3051 > URL: https://issues.apache.org/jira/browse/PIG-3051 > Project: Pig > Issue Type: Bug > Components: parser > Affects Versions: 0.10.0, 0.11 > Reporter: Koji Noguchi > Assignee: Koji Noguchi > > Had a user hitting > "Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1" error > when he had multiple stores and limit in his code. > I couldn't reproduce this with short pig code (due to ColumnPruning somehow > not happening when shortened), but here's a snippet. > {noformat} > ... > G3 = FOREACH G2 GENERATE sortCol, FLATTEN(group) as label, (long)COUNT(G1) as > cnt; > G4 = ORDER G3 BY cnt DESC PARALLEL 25; > ONEROW = LIMIT G4 1; > U1 = FOREACH ONEROW GENERATE 3 as sortcol, 'somelabel' as label, cnt; > store U1 into 'u1' using PigStorage(); > store G4 into 'g4' using PigStorage(); > {noformat} > With '-t ColumnMapKeyPrune', job didn't hit the error. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira