chenglei created PHOENIX-3705:
---------------------------------
Summary: SkipScanFilter may repeatedly copy rowKey Columns to
startKey
Key: PHOENIX-3705
URL: https://issues.apache.org/jira/browse/PHOENIX-3705
Project: Phoenix
Issue Type: Bug
Affects Versions: 4.9.0
Reporter: chenglei
See following simple unit test first,the rowKey is composed of three PInteger
columns,and the slots of SkipScanFilter are:
[[[1 - 4]], [5, 7], [[9 - 10]]],
When SkipScanFilter.filterKeyValue method is invoked on a KeyValue whose rowKey
is [2,7,11], obviously SkipScanFilter.filterKeyValue
returns ReturnCode.SEEK_NEXT_USING_HINT and SkipScanFilter.getNextCellHint
returns [3,5,9], but unfortunately, SkipScanFilter.getNextCellHint actually
returns [2,8,5,9] , a very strange value, the unit tests failed.
{code}
@Test
public void testNavigate() {
RowKeySchemaBuilder builder = new RowKeySchemaBuilder(3);
for(int i=0;i<3;i++) {
builder.addField(
new PDatum() {
@Override
public boolean isNullable() {
return false;
}
@Override
public PDataType getDataType() {
return PInteger.INSTANCE;
}
@Override
public Integer getMaxLength() {
return PInteger.INSTANCE.getMaxLength(null);
}
@Override
public Integer getScale() {
return PInteger.INSTANCE.getScale(null);
}
@Override
public SortOrder getSortOrder() {
return SortOrder.getDefault();
}
}, false, SortOrder.getDefault());
}
List<List<KeyRange>> rowKeyColumnRangesList=Arrays.asList(
Arrays.asList(
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(1),
true, PInteger.INSTANCE.toBytes(4), true)),
Arrays.asList(
KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(5)),
KeyRange.getKeyRange(PInteger.INSTANCE.toBytes(7))),
Arrays.asList(
PInteger.INSTANCE.getKeyRange(PInteger.INSTANCE.toBytes(9),
true, PInteger.INSTANCE.toBytes(10), true))
);
SkipScanFilter skipScanFilter=new
SkipScanFilter(rowKeyColumnRangesList, builder.build());
System.out.println(skipScanFilter);
byte[] rowKey=ByteUtil.concat(
PInteger.INSTANCE.toBytes(2),
PInteger.INSTANCE.toBytes(7),
PInteger.INSTANCE.toBytes(11));
KeyValue keyValue=KeyValue.createFirstOnRow(rowKey);
ReturnCode returnCode=skipScanFilter.filterKeyValue(keyValue);
assertTrue(returnCode == ReturnCode.SEEK_NEXT_USING_HINT);
Cell nextCellHint=skipScanFilter.getNextCellHint(keyValue);
assertTrue(Bytes.toStringBinary(CellUtil.cloneRow(nextCellHint)).equals("\\x80\\x00\\x00\\x03\\x80\\x00\\x00\\x05\\x80\\x00\\x00\\x09"));
}
{code}
Let us see what's wrong, first column of rowKey [2,7,11 ] is 2, which is in
SkipScanFilter's first slot range [1-4], so position[0] is 0 and we go to the
second column 7, which match the second range [7] of SkipScanFilter's second
slot [5, 7],so position[1] is 1 and we go to the third column 11, which is
bigger than the third slot range [9 - 10],so position[2] is 0 and we begin to
backtrack to second column, because the second range [7] of SkipScanFilter's
second slot is singleKey and there is no more range,so position[1] is 0 and we
continue to backtrack to first column, because the first slot range [1-4] is
not singleKey so we stop backtracking at first column.
Now the problem comes, in following line 448 of SkipScanFilter.navigate
method,SkipScanFilter.setStartKey method is invoked,first copy rowKey columns
before ptr to SkipScanFilter.startKey, because now ptr still point to the third
column, so copy the first and second columns to
SkipScanFilter.startKey,SkipScanFilter.startKey is [2,7] after this step ,
then setStartKey method copy the lower bound SkipScanFilter.slots from j+1
column, accoring to SkipScanFilter.position array,now j is 0, and both
position[1] and position[2] are 0,so SkipScanFilter.startKey becomes [2,7,5,9],
and in following line 457, ByteUtil.nextKey is invoked on [2,7], [2,7] is
incremented to [2,8], finally SkipScanFilter.startKey is [2,8,5,9].
{code}
448 int currentLength = setStartKey(ptr, minOffset, j+1,
nSlots, false);
449 // From here on, we use startKey as our buffer
(resetting minOffset and maxOffset)
450 // We've copied the part of the current key above that we
need into startKey
451 // Reinitialize the iterator to be positioned at
previous slot position
452 minOffset = 0;
453 maxOffset = startKeyLength;
454 schema.iterator(startKey, minOffset, maxOffset, ptr,
ScanUtil.getRowKeyPosition(slotSpan, j)+1);
455 // Do nextKey after setting the accessor b/c otherwise
the null byte may have
456 // been incremented causing us not to find it
457 ByteUtil.nextKey(startKey, currentLength);
{code}
>From above analysis, we can see the second column is repeatedly copied to
>SkipScanFilter.startKey, copy 7 to SkipScanFilter.startKey is error ,before
>SkipScanFilter.setStartKey method is invoked in line 448, we should first move
>ptr to j+1, that is to say ,following code might be inserted before line 448:
{code}
schema.reposition(
ptr,
ScanUtil.getRowKeyPosition(slotSpan, i),
ScanUtil.getRowKeyPosition(slotSpan, j+1),
minOffset,
maxOffset,
slotSpan[j+1]);
{code}
Another problem is why the existing SkipScanFilter unit tests are ok? That is
because in all existing unit test cases, backtrack column is just before
current column which ptr points to, i.e. i==j+1, and this problem may not
affect the final query result , but obviously, it may lead to the erroneous
startKey and cause the Skip Scan inefficient.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)