[ 
https://issues.apache.org/jira/browse/PHOENIX-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenglei updated PHOENIX-3705:
------------------------------
    Description: 
See following simple unit test first,the rowKey is composed of three PInteger 
columns,and the slots of SkipScanFilter are:
[ [[1 - 4]], [5, 7], [[9 - 10]] ],
When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey 
is [2,7,11], obviously SkipScanFilter.filterKeyValue
returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint 
returns  [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually 
returns  [2,8,5,9] , a very strange value, the unit tests failed.
{code} 
    @Test
    public void testNavigate() {
        RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3);
        for(int i=0;i<3;i++) {
            builder.addField(
                    new PDatum() {

                        @Override
                        public boolean isNullable() {
                            return false;
                        }

                        @Override
                        public PDataType getDataType() {
                            return PInteger.INSTANCE;
                        }

                        @Override
                        public Integer getMaxLength() {
                            return PInteger.INSTANCE.getMaxLength(null);
                        }

                        @Override
                        public Integer getScale() {
                            return PInteger.INSTANCE.getScale(null);
                        }

                        @Override
                        public SortOrder getSortOrder() {
                            return SortOrder.getDefault();
                        }

                    }, false, SortOrder.getDefault());
        }
        
        List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList(      
                Arrays.asList(
                    PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), 
true, PInteger.INSTANCE.toBytes(4), true)),
                Arrays.asList(
                    KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)),
                    KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))),
                Arrays.asList(
                    PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), 
true, PInteger.INSTANCE.toBytes(10), true))
        );
        
        SkipScanFilter skipScanFilter=new 
SkipScanFilter(rowKeyColumnRangesList, builder.build());
        
        System.out.println(skipScanFilter);
        
        byte[] rowKey=ByteUtil.concat(
                PInteger.INSTANCE.toBytes(2), 
                PInteger.INSTANCE.toBytes(7),
                PInteger.INSTANCE.toBytes(11));
        KeyValue keyValue=KeyValue.createFirstOnRow(rowKey);
        ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue);
        assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT);
        Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue);
        assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals(
        "\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
    }
{code}

Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in 
SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the 
second column 7, which match the second range [7] of SkipScanFilter's second 
slot [5, 7],so position[1] is 1 and we go to the third column 11, which is 
bigger than the third slot range [9 - 10],so position[2] is 0 and the 
{{SkipScanFilter.ptr}} which points to current column still stay on the third 
column. Now we begin to backtrack to second column, because the second range 
[7] of SkipScanFilter's second slot is singleKey and there is no more range,so 
position[1] is 0 and we continue to backtrack to first column, because the 
first slot range [1-4] is not singleKey so we stop backtracking at first column.

Now the problem comes, in following line 448 of {{SkipScanFilter.navigate}} 
method,{{SkipScanFilter.setStartKey}} method is invoked,first copy rowKey 
columns before {{SkipScanFilter.ptr}} to {{SkipScanFilter.startKey}}, because 
now {{SkipScanFilter.ptr}} still point to the third column, so copy the first 
and second columns to {{SkipScanFilter.startKey}},{{SkipScanFilter.startKey}} 
is [2,7]  after this step , then setStartKey method copy the lower bound 
{{SkipScanFilter.slots}} from {{j+1}} column, accoring to 
{{SkipScanFilter.position}} array,because now j is 0, and both position[1] and 
position[2] are 0, so {{SkipScanFilter.startKey}} becomes [2,7,5,9], and in 
following line 457, {{ByteUtil.nextKey}} is invoked on [2,7], [2,7] is 
incremented to [2,8], finally {{SkipScanFilter.startKey}} is [2,8,5,9].

{code} 
448                    int currentLength = setStartKey(ptr, minOffset, j+1, 
nSlots, false);
449                    // From here on, we use startKey as our buffer 
(resetting minOffset and maxOffset)
450                    // We've copied the part of the current key above that 
we need into startKey
451                    // Reinitialize the iterator to be positioned at 
previous slot position
452                    minOffset = 0;
453                    maxOffset = startKeyLength;
454                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
ScanUtil.getRowKeyPosition(slotSpan, j)+1);
455                    // Do nextKey after setting the accessor b/c otherwise 
the null byte may have
456                    // been incremented causing us not to find it
457                    ByteUtil.nextKey(startKey, currentLength);
{code} 

>From above analysis, we can see the second column is repeatedly copied to 
>{{SkipScanFilter.startKey}}, copy 7 to {{SkipScanFilter.startKey}} is error 
>,before {{SkipScanFilter.setStartKey}} method is invoked in line 448, we 
>should first move {{SkipScanFilter.ptr}} to {{j+1}}, that is to say ,following 
>code might be inserted before line 448:
{code} 
                  schema.reposition(
                            ptr, 
                            ScanUtil.getRowKeyPosition(slotSpan, i), 
                            ScanUtil.getRowKeyPosition(slotSpan, j+1), 
                            minOffset, 
                            maxOffset, 
                            slotSpan[j+1]);
{code} 
 Another problem is why the existing SkipScanFilter unit tests are ok? That is 
because in all existing unit test cases, backtrack column is just before 
current column which ptr points to,  i.e. i==j+1, so the unit tests are ok.

This issue may not affect the final query result , but obviously, it may lead 
to the erroneous startKey and  cause the Skip Scan inefficient.

  was:
See following simple unit test first,the rowKey is composed of three PInteger 
columns,and the slots of SkipScanFilter are:
[ [[1 - 4]], [5, 7], [[9 - 10]] ],
When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey 
is [2,7,11], obviously SkipScanFilter.filterKeyValue
returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint 
returns  [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually 
returns  [2,8,5,9] , a very strange value, the unit tests failed.
{code} 
    @Test
    public void testNavigate() {
        RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3);
        for(int i=0;i<3;i++) {
            builder.addField(
                    new PDatum() {

                        @Override
                        public boolean isNullable() {
                            return false;
                        }

                        @Override
                        public PDataType getDataType() {
                            return PInteger.INSTANCE;
                        }

                        @Override
                        public Integer getMaxLength() {
                            return PInteger.INSTANCE.getMaxLength(null);
                        }

                        @Override
                        public Integer getScale() {
                            return PInteger.INSTANCE.getScale(null);
                        }

                        @Override
                        public SortOrder getSortOrder() {
                            return SortOrder.getDefault();
                        }

                    }, false, SortOrder.getDefault());
        }
        
        List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList(      
                Arrays.asList(
                    PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), 
true, PInteger.INSTANCE.toBytes(4), true)),
                Arrays.asList(
                    KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)),
                    KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))),
                Arrays.asList(
                    PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), 
true, PInteger.INSTANCE.toBytes(10), true))
        );
        
        SkipScanFilter skipScanFilter=new 
SkipScanFilter(rowKeyColumnRangesList, builder.build());
        
        System.out.println(skipScanFilter);
        
        byte[] rowKey=ByteUtil.concat(
                PInteger.INSTANCE.toBytes(2), 
                PInteger.INSTANCE.toBytes(7),
                PInteger.INSTANCE.toBytes(11));
        KeyValue keyValue=KeyValue.createFirstOnRow(rowKey);
        ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue);
        assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT);
        Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue);
        assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals(
        "\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
    }
{code}

Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in 
SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the 
second column 7, which match the second range [7] of SkipScanFilter's second 
slot [5, 7],so position[1] is 1 and we go to the third column 11, which is 
bigger than the third slot range [9 - 10],so position[2] is 0 and the 
{{SkipScanFilter.ptr}} which points to current column still stay on the third 
column. Now we begin to backtrack to second column, because the second range 
[7] of SkipScanFilter's second slot is singleKey and there is no more range,so 
position[1] is 0 and we continue to backtrack to first column, because the 
first slot range [1-4] is not singleKey so we stop backtracking at first column.

Now the problem comes, in following line 448 of {{SkipScanFilter.navigate}} 
method,{{SkipScanFilter.setStartKey}} method is invoked,first copy rowKey 
columns before {{SkipScanFilter.ptr}} to {{SkipScanFilter.startKey}}, because 
now {{SkipScanFilter.ptr}} still point to the third column, so copy the first 
and second columns to {{SkipScanFilter.startKey}},{{SkipScanFilter.startKey}} 
is [2,7]  after this step , then setStartKey method copy the lower bound 
{{SkipScanFilter.slots}} from {{j+1}} column, accoring to 
{{SkipScanFilter.position}} array,now j is 0, and both position[1] and 
position[2] are 0,so {{SkipScanFilter.startKey}} becomes [2,7,5,9], and in 
following line 457, {{ByteUtil.nextKey}} is invoked on [2,7], [2,7] is 
incremented to [2,8], finally {{SkipScanFilter.startKey}} is [2,8,5,9].

{code} 
448                    int currentLength = setStartKey(ptr, minOffset, j+1, 
nSlots, false);
449                    // From here on, we use startKey as our buffer 
(resetting minOffset and maxOffset)
450                    // We've copied the part of the current key above that 
we need into startKey
451                    // Reinitialize the iterator to be positioned at 
previous slot position
452                    minOffset = 0;
453                    maxOffset = startKeyLength;
454                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
ScanUtil.getRowKeyPosition(slotSpan, j)+1);
455                    // Do nextKey after setting the accessor b/c otherwise 
the null byte may have
456                    // been incremented causing us not to find it
457                    ByteUtil.nextKey(startKey, currentLength);
{code} 

>From above analysis, we can see the second column is repeatedly copied to 
>{{SkipScanFilter.startKey}}, copy 7 to {{SkipScanFilter.startKey}} is error 
>,before {{SkipScanFilter.setStartKey}} method is invoked in line 448, we 
>should first move {{SkipScanFilter.ptr}} to {{j+1}}, that is to say ,following 
>code might be inserted before line 448:
{code} 
                  schema.reposition(
                            ptr, 
                            ScanUtil.getRowKeyPosition(slotSpan, i), 
                            ScanUtil.getRowKeyPosition(slotSpan, j+1), 
                            minOffset, 
                            maxOffset, 
                            slotSpan[j+1]);
{code} 
 Another problem is why the existing SkipScanFilter unit tests are ok? That is 
because in all existing unit test cases, backtrack column is just before 
current column which ptr points to,  i.e. i==j+1, so the unit tests are ok.

This issue may not affect the final query result , but obviously, it may lead 
to the erroneous startKey and  cause the Skip Scan inefficient.


> SkipScanFilter may repeatedly copy rowKey Columns to startKey
> -------------------------------------------------------------
>
>                 Key: PHOENIX-3705
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3705
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.9.0
>            Reporter: chenglei
>         Attachments: PHOENIX-3705_v1.patch
>
>
> See following simple unit test first,the rowKey is composed of three PInteger 
> columns,and the slots of SkipScanFilter are:
> [ [[1 - 4]], [5, 7], [[9 - 10]] ],
> When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose 
> rowKey is [2,7,11], obviously SkipScanFilter.filterKeyValue
> returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint 
> returns  [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually 
> returns  [2,8,5,9] , a very strange value, the unit tests failed.
> {code} 
>     @Test
>     public void testNavigate() {
>         RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3);
>         for(int i=0;i<3;i++) {
>             builder.addField(
>                     new PDatum() {
>                         @Override
>                         public boolean isNullable() {
>                             return false;
>                         }
>                         @Override
>                         public PDataType getDataType() {
>                             return PInteger.INSTANCE;
>                         }
>                         @Override
>                         public Integer getMaxLength() {
>                             return PInteger.INSTANCE.getMaxLength(null);
>                         }
>                         @Override
>                         public Integer getScale() {
>                             return PInteger.INSTANCE.getScale(null);
>                         }
>                         @Override
>                         public SortOrder getSortOrder() {
>                             return SortOrder.getDefault();
>                         }
>                     }, false, SortOrder.getDefault());
>         }
>         
>         List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList(      
>                 Arrays.asList(
>                     
> PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), true, 
> PInteger.INSTANCE.toBytes(4), true)),
>                 Arrays.asList(
>                     KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)),
>                     KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))),
>                 Arrays.asList(
>                     
> PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, 
> PInteger.INSTANCE.toBytes(10), true))
>         );
>         
>         SkipScanFilter skipScanFilter=new 
> SkipScanFilter(rowKeyColumnRangesList, builder.build());
>         
>         System.out.println(skipScanFilter);
>         
>         byte[] rowKey=ByteUtil.concat(
>                 PInteger.INSTANCE.toBytes(2), 
>                 PInteger.INSTANCE.toBytes(7),
>                 PInteger.INSTANCE.toBytes(11));
>         KeyValue keyValue=KeyValue.createFirstOnRow(rowKey);
>         ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue);
>         assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT);
>         Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue);
>         
> assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals(
>         "\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
>     }
> {code}
> Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in 
> SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the 
> second column 7, which match the second range [7] of SkipScanFilter's second 
> slot [5, 7],so position[1] is 1 and we go to the third column 11, which is 
> bigger than the third slot range [9 - 10],so position[2] is 0 and the 
> {{SkipScanFilter.ptr}} which points to current column still stay on the third 
> column. Now we begin to backtrack to second column, because the second range 
> [7] of SkipScanFilter's second slot is singleKey and there is no more 
> range,so position[1] is 0 and we continue to backtrack to first column, 
> because the first slot range [1-4] is not singleKey so we stop backtracking 
> at first column.
> Now the problem comes, in following line 448 of {{SkipScanFilter.navigate}} 
> method,{{SkipScanFilter.setStartKey}} method is invoked,first copy rowKey 
> columns before {{SkipScanFilter.ptr}} to {{SkipScanFilter.startKey}}, because 
> now {{SkipScanFilter.ptr}} still point to the third column, so copy the first 
> and second columns to {{SkipScanFilter.startKey}},{{SkipScanFilter.startKey}} 
> is [2,7]  after this step , then setStartKey method copy the lower bound 
> {{SkipScanFilter.slots}} from {{j+1}} column, accoring to 
> {{SkipScanFilter.position}} array,because now j is 0, and both position[1] 
> and position[2] are 0, so {{SkipScanFilter.startKey}} becomes [2,7,5,9], and 
> in following line 457, {{ByteUtil.nextKey}} is invoked on [2,7], [2,7] is 
> incremented to [2,8], finally {{SkipScanFilter.startKey}} is [2,8,5,9].
> {code} 
> 448                    int currentLength = setStartKey(ptr, minOffset, j+1, 
> nSlots, false);
> 449                    // From here on, we use startKey as our buffer 
> (resetting minOffset and maxOffset)
> 450                    // We've copied the part of the current key above that 
> we need into startKey
> 451                    // Reinitialize the iterator to be positioned at 
> previous slot position
> 452                    minOffset = 0;
> 453                    maxOffset = startKeyLength;
> 454                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
> ScanUtil.getRowKeyPosition(slotSpan, j)+1);
> 455                    // Do nextKey after setting the accessor b/c otherwise 
> the null byte may have
> 456                    // been incremented causing us not to find it
> 457                    ByteUtil.nextKey(startKey, currentLength);
> {code} 
> From above analysis, we can see the second column is repeatedly copied to 
> {{SkipScanFilter.startKey}}, copy 7 to {{SkipScanFilter.startKey}} is error 
> ,before {{SkipScanFilter.setStartKey}} method is invoked in line 448, we 
> should first move {{SkipScanFilter.ptr}} to {{j+1}}, that is to say 
> ,following code might be inserted before line 448:
> {code} 
>                   schema.reposition(
>                             ptr, 
>                             ScanUtil.getRowKeyPosition(slotSpan, i), 
>                             ScanUtil.getRowKeyPosition(slotSpan, j+1), 
>                             minOffset, 
>                             maxOffset, 
>                             slotSpan[j+1]);
> {code} 
>  Another problem is why the existing SkipScanFilter unit tests are ok? That 
> is because in all existing unit test cases, backtrack column is just before 
> current column which ptr points to,  i.e. i==j+1, so the unit tests are ok.
> This issue may not affect the final query result , but obviously, it may lead 
> to the erroneous startKey and  cause the Skip Scan inefficient.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to