[ 
https://issues.apache.org/jira/browse/PHOENIX-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15592844#comment-15592844
 ] 

Samarth Jain commented on PHOENIX-3392:
---------------------------------------

Thanks for taking a look, [~maryannxue]. The general idea behind the encoding 
scheme, to save space and to optimize key value look up, is to use number based 
column qualifiers. When constructing the scan object, we figure out what the 
range of column qualifiers that we need to project in the scan. This is done in 
the following method in BaseResultIterators

{code}
private static Pair<Integer, Integer> getMinMaxQualifiers(Scan scan, 
StatementContext context)
{code}

This range is then set in the BaseResultIterators.initializeScan method like 
this:

{code}
if (setMinMaxQualifiersOnScan(table)) {
                Pair<Integer, Integer> minMaxQualifiers = 
getMinMaxQualifiers(scan, context);
                if (minMaxQualifiers != null) {
                    scan.setAttribute(BaseScannerRegionObserver.MIN_QUALIFIER, 
PInteger.INSTANCE.toBytes(minMaxQualifiers.getFirst()));
                    scan.setAttribute(BaseScannerRegionObserver.MAX_QUALIFIER, 
PInteger.INSTANCE.toBytes(minMaxQualifiers.getSecond()));
                }
            }
{code}

On the server side, we use a custom list implementation called 
EncodedColumnQualifierCellsList. Using this list, we can do an O(1) look up for 
a particular key value by using the number based column qualifier as an index 
in the list.  The range that we set on the scan object on the client side is 
then used to appropriately size the EncodedColumnQualifierCellsList.

In this particular test case, it looks like that the way we are determining the 
range of column qualifiers is incorrect. The server side scanner is returning 
key value with qualifier 12 even though we expect the largest qualifier number 
to be returned to be 11. One thing to note here is that we reserve the range 
(0, 10) for phoenix internal column qualifiers like the empty column's 
qualifier, result column qualifier, etc. So in this case our client side code 
determined that there was only one column qualifier to be projected whose 
qualifier name was 11. 

{code}
Caused by: java.lang.IndexOutOfBoundsException: Qualifier 12 is out of the 
valid range. Reserved: (0, 10). Table column qualifier range: (11, 11)
{code}


> SortMergeJoinIT#testSubJoin[0] is failing with encodecolumns2 branch
> --------------------------------------------------------------------
>
>                 Key: PHOENIX-3392
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3392
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> The SortMergeJoinIT#testSubJoin[0] is failing with encodecolumns2 branch. To 
> repro, checkout the encodecolumns2 branch and run the SortMergeJoinIT tests. 
> Here's the stack trace is here: 
> https://builds.apache.org/job/Phoenix-encode-columns/16/testReport/org.apache.phoenix.end2end/SortMergeJoinIT/testSubJoin_0_/
> The basic idea of column encoding over mutable tables is that we take control 
> of the column qualifier name, storing in PColumn the mapping of real column 
> name to column qualifier name. We use serialized integers as the column 
> qualifier names so that we can do positional lookups into the List<Cell> we 
> get back from HBase APIs. There are a few "reserved" column qualifiers for 
> things like our empty key value, etc. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to