[jira] [Updated] (PHOENIX-29) Add custom filter to more efficiently navigate KeyValues in row

Anoop Sam John (JIRA) Sun, 23 Feb 2014 12:01:06 -0800

     [ 
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Anoop Sam John updated PHOENIX-29:
----------------------------------

    Attachment: PHOENIX-29_V5.patch

The UT run on V4 patch produced some test failures and investigating on this I 
found the reason.   The V5 patch solves this issue also.  The change is this in 
ParallelIterators.java
{code}
-            if (familyMap.isEmpty() && table.getColumnFamilies().size() == 1) {
+            // Where condition columns also will get added into familyMap
+            // When where conditions are present, we can not add 
FirstKeyOnlyFilter at beginning.
+            if (familyMap.isEmpty() && 
context.getWhereCoditionColumns().isEmpty()
+                    && table.getColumnFamilies().size() == 1) {
{code}


> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch, 
> PHOENIX-29_V5.patch, PHOENIX-29_v3.patch, PHOENIX-29_v4.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in 
> selecting any other column. The reason is that when you project a column into 
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the 
> column. The only case where this is not necessary is when the column is the 
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient 
> to just do a NEXT instead of a reseek (especially if your KV is the next 
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that 
> need to be returned back to the client which is another advantage we'd get 
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and 
> merge between them and the incoming KVs, using NEXT instead of a reseek. We 
> could potentially use a reseek if the number of columns in the table is 
> beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Updated] (PHOENIX-29) Add custom filter to more efficiently navigate KeyValues in row

Reply via email to