[
https://issues.apache.org/jira/browse/PHOENIX-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592844#comment-15592844
]
Samarth Jain commented on PHOENIX-3392:
---------------------------------------
Thanks for taking a look, [~maryannxue]. The general idea behind the encoding
scheme, to save space and to optimize key value look up, is to use number based
column qualifiers. When constructing the scan object, we figure out what the
range of column qualifiers that we need to project in the scan. This is done in
the following method in BaseResultIterators
{code}
private static Pair<Integer, Integer> getMinMaxQualifiers(Scan scan,
StatementContext context)
{code}
This range is then set in the BaseResultIterators.initializeScan method like
this:
{code}
if (setMinMaxQualifiersOnScan(table)) {
Pair<Integer, Integer> minMaxQualifiers =
getMinMaxQualifiers(scan, context);
if (minMaxQualifiers != null) {
scan.setAttribute(BaseScannerRegionObserver.MIN_QUALIFIER,
PInteger.INSTANCE.toBytes(minMaxQualifiers.getFirst()));
scan.setAttribute(BaseScannerRegionObserver.MAX_QUALIFIER,
PInteger.INSTANCE.toBytes(minMaxQualifiers.getSecond()));
}
}
{code}
On the server side, we use a custom list implementation called
EncodedColumnQualifierCellsList. Using this list, we can do an O(1) look up for
a particular key value by using the number based column qualifier as an index
in the list. The range that we set on the scan object on the client side is
then used to appropriately size the EncodedColumnQualifierCellsList.
In this particular test case, it looks like that the way we are determining the
range of column qualifiers is incorrect. The server side scanner is returning
key value with qualifier 12 even though we expect the largest qualifier number
to be returned to be 11. One thing to note here is that we reserve the range
(0, 10) for phoenix internal column qualifiers like the empty column's
qualifier, result column qualifier, etc. So in this case our client side code
determined that there was only one column qualifier to be projected whose
qualifier name was 11.
{code}
Caused by: java.lang.IndexOutOfBoundsException: Qualifier 12 is out of the
valid range. Reserved: (0, 10). Table column qualifier range: (11, 11)
{code}
> SortMergeJoinIT#testSubJoin[0] is failing with encodecolumns2 branch
> --------------------------------------------------------------------
>
> Key: PHOENIX-3392
> URL: https://issues.apache.org/jira/browse/PHOENIX-3392
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> The SortMergeJoinIT#testSubJoin[0] is failing with encodecolumns2 branch. To
> repro, checkout the encodecolumns2 branch and run the SortMergeJoinIT tests.
> Here's the stack trace is here:
> https://builds.apache.org/job/Phoenix-encode-columns/16/testReport/org.apache.phoenix.end2end/SortMergeJoinIT/testSubJoin_0_/
> The basic idea of column encoding over mutable tables is that we take control
> of the column qualifier name, storing in PColumn the mapping of real column
> name to column qualifier name. We use serialized integers as the column
> qualifier names so that we can do positional lookups into the List<Cell> we
> get back from HBase APIs. There are a few "reserved" column qualifiers for
> things like our empty key value, etc.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)