[jira] [Comment Edited] (PHOENIX-3705) SkipScanFilter may repeatedly copy rowKey Columns to startKey

James Taylor (JIRA) Wed, 01 Mar 2017 11:23:59 -0800

    [ 
https://issues.apache.org/jira/browse/PHOENIX-3705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15890854#comment-15890854
 ]


James Taylor edited comment on PHOENIX-3705 at 3/1/17 7:23 PM:
---------------------------------------------------------------

Very nice, [~comnetwork]. It seems like this bug could affect the final query 
result, perhaps skipping rows it shouldn't depending on the data, no? Anytime 
the seek next hint is wrong, I think this could be the case.

Small question on whether perhaps one more change is required. I noticed you 
used {{ScanUtil.getRowKeyPosition(slotSpan, j + 1)}} below in the call to 
reposition, but further down (in existing code), we use 
{{ScanUtil.getRowKeyPosition(slotSpan, j)+1}}. Should the latter be changed as 
you've coded? Would be good to have a test case that uses Row Value Constructor 
(i.e. has a slot span) to confirm:
{code}
                    schema.reposition(
                            ptr,
                            ScanUtil.getRowKeyPosition(slotSpan, i),
                            ScanUtil.getRowKeyPosition(slotSpan, j + 1),
                            minOffset,
                            maxOffset,
                            slotSpan[j + 1]);
                    int currentLength = setStartKey(ptr, minOffset, j+1, 
nSlots, false);
                    // From here on, we use startKey as our buffer (resetting 
minOffset and maxOffset)
                    // We've copied the part of the current key above that we 
need into startKey
                    // Reinitialize the iterator to be positioned at previous 
slot position
                    minOffset = 0;
                    maxOffset = startKeyLength;
                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
ScanUtil.getRowKeyPosition(slotSpan, j)+1);
{code}

One more small request: would you mind adding a code comment before your 
reposition call explaining why that's necessary?


was (Author: jamestaylor):
Very nice, [~comnetwork]. It seems like this bug could affect the final query 
result, perhaps skipping rows it shouldn't depending on the data, no? Anytime 
the seek next hint is wrong, I think this could be the case.

Small question on whether perhaps one more change is required. I noticed you 
used {{ScanUtil.getRowKeyPosition(slotSpan, j + 1)}} below in the call to 
reposition, but further down (in existing code), we use 
{{ScanUtil.getRowKeyPosition(slotSpan, j)+1}}. Should the latter be changed as 
you've coded? Would be good to have a test case that uses Row Value Constructor 
(i.e. has a slot span) to confirm:
{code}
                    schema.reposition(
                            ptr,
                            ScanUtil.getRowKeyPosition(slotSpan, i),
                            ScanUtil.getRowKeyPosition(slotSpan, j + 1),
                            minOffset,
                            maxOffset,
                            slotSpan[j + 1]);
                    int currentLength = setStartKey(ptr, minOffset, j+1, 
nSlots, false);
                    // From here on, we use startKey as our buffer (resetting 
minOffset and maxOffset)
                    // We've copied the part of the current key above that we 
need into startKey
                    // Reinitialize the iterator to be positioned at previous 
slot position
                    minOffset = 0;
                    maxOffset = startKeyLength;
                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
ScanUtil.getRowKeyPosition(slotSpan, j)+1);
{code}

> SkipScanFilter may repeatedly copy rowKey Columns to startKey
> -------------------------------------------------------------
>
>                 Key: PHOENIX-3705
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3705
>             Project: Phoenix
>          Issue Type: Bug
>    Affects Versions: 4.9.0
>            Reporter: chenglei
>            Priority: Blocker
>             Fix For: 4.10.0
>
>         Attachments: PHOENIX-3705_v1.patch
>
>
> See following simple unit test first,the rowKey is composed of three PInteger 
> columns,and the slots of SkipScanFilter are:
> [ [[1 - 4]], [5, 7], [[9 - 10]] ],
> When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose 
> rowKey is [2,7,11], obviously SkipScanFilter.filterKeyValue
> returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint 
> returns  [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually 
> returns  [2,8,5,9] , a very strange value, the unit tests failed.
> {code} 
>     @Test
>     public void testNavigate() {
>         RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3);
>         for(int i=0;i<3;i++) {
>             builder.addField(
>                     new PDatum() {
>                         @Override
>                         public boolean isNullable() {
>                             return false;
>                         }
>                         @Override
>                         public PDataType getDataType() {
>                             return PInteger.INSTANCE;
>                         }
>                         @Override
>                         public Integer getMaxLength() {
>                             return PInteger.INSTANCE.getMaxLength(null);
>                         }
>                         @Override
>                         public Integer getScale() {
>                             return PInteger.INSTANCE.getScale(null);
>                         }
>                         @Override
>                         public SortOrder getSortOrder() {
>                             return SortOrder.getDefault();
>                         }
>                     }, false, SortOrder.getDefault());
>         }
>         
>         List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList(      
>                 Arrays.asList(
>                     
> PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1), true, 
> PInteger.INSTANCE.toBytes(4), true)),
>                 Arrays.asList(
>                     KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)),
>                     KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))),
>                 Arrays.asList(
>                     
> PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9), true, 
> PInteger.INSTANCE.toBytes(10), true))
>         );
>         
>         SkipScanFilter skipScanFilter=new 
> SkipScanFilter(rowKeyColumnRangesList, builder.build());
>         
>         System.out.println(skipScanFilter);
>         
>         byte[] rowKey=ByteUtil.concat(
>                 PInteger.INSTANCE.toBytes(2), 
>                 PInteger.INSTANCE.toBytes(7),
>                 PInteger.INSTANCE.toBytes(11));
>         KeyValue keyValue=KeyValue.createFirstOnRow(rowKey);
>         ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue);
>         assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT);
>         Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue);
>         
> assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals(
>         "\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
>     }
> {code}
> Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in 
> SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the 
> second column 7, which match the second range [7] of SkipScanFilter's second 
> slot [5, 7],so position[1] is 1 and we go to the third column 11, which is 
> bigger than the third slot range [9 - 10],so position[2] is 0 and the 
> {{SkipScanFilter.ptr}} which points to current column still stay on the third 
> column. Now we begin to backtrack to second column, because the second range 
> [7] of SkipScanFilter's second slot is singleKey and there is no more 
> range,so position[1] is 0 and we continue to backtrack to first column, 
> because the first slot range [1-4] is not singleKey so we stop backtracking 
> at first column.
> Now the problem comes, in following line 448 of {{SkipScanFilter.navigate}} 
> method,{{SkipScanFilter.setStartKey}} method is invoked,first copy rowKey 
> columns before {{SkipScanFilter.ptr}} to {{SkipScanFilter.startKey}}, because 
> now {{SkipScanFilter.ptr}} still point to the third column, so copy the first 
> and second columns to {{SkipScanFilter.startKey}},{{SkipScanFilter.startKey}} 
> is [2,7]  after this step , then setStartKey method copy the lower bound 
> {{SkipScanFilter.slots}} from {{j+1}} column, accoring to 
> {{SkipScanFilter.position}} array,because now j is 0, and both position[1] 
> and position[2] are 0, so {{SkipScanFilter.startKey}} becomes [2,7,5,9], and 
> in following line 457, {{ByteUtil.nextKey}} is invoked on [2,7], [2,7] is 
> incremented to [2,8], finally {{SkipScanFilter.startKey}} is [2,8,5,9].
> {code} 
> 448                    int currentLength = setStartKey(ptr, minOffset, j+1, 
> nSlots, false);
> 449                    // From here on, we use startKey as our buffer 
> (resetting minOffset and maxOffset)
> 450                    // We've copied the part of the current key above that 
> we need into startKey
> 451                    // Reinitialize the iterator to be positioned at 
> previous slot position
> 452                    minOffset = 0;
> 453                    maxOffset = startKeyLength;
> 454                    schema.iterator(startKey, minOffset, maxOffset, ptr, 
> ScanUtil.getRowKeyPosition(slotSpan, j)+1);
> 455                    // Do nextKey after setting the accessor b/c otherwise 
> the null byte may have
> 456                    // been incremented causing us not to find it
> 457                    ByteUtil.nextKey(startKey, currentLength);
> {code} 
> From above analysis, we can see the second column is repeatedly copied to 
> {{SkipScanFilter.startKey}}, copy 7 to {{SkipScanFilter.startKey}} is error 
> ,before {{SkipScanFilter.setStartKey}} method is invoked in line 448, we 
> should first move {{SkipScanFilter.ptr}} to {{j+1}}, that is to say 
> ,following code might be inserted before line 448:
> {code} 
>                   schema.reposition(
>                             ptr, 
>                             ScanUtil.getRowKeyPosition(slotSpan, i), 
>                             ScanUtil.getRowKeyPosition(slotSpan, j+1), 
>                             minOffset, 
>                             maxOffset, 
>                             slotSpan[j+1]);
> {code} 
>  Another problem is why the existing SkipScanFilter unit tests are ok? That 
> is because in all existing unit test cases, backtrack column is just before 
> current column which ptr points to,  i.e. i==j+1, so the unit tests are ok.
> This issue may not affect the final query result , but obviously, it may lead 
> to the erroneous startKey and  cause the Skip Scan inefficient.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Comment Edited] (PHOENIX-3705) SkipScanFilter may repeatedly copy rowKey Columns to startKey

Reply via email to