[ 
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13906656#comment-13906656
 ] 

James Taylor commented on PHOENIX-29:
-------------------------------------

There are two case for  ( ! projector.isProjectEmptyKeyValue() ):
1) You're doing a SELECT * wildcard projection. In this case, we want to keep 
all the KeyValue (and basically not include your filter).
2) You're using a mapped VIEW. This is trickier b/c we don't have our empty key 
value to fallback on. The ideal thing to do would be to have your filter add an 
empty key value and then let it do it's thing by removing anything that's not 
projected. If you can work this out, that would be a big perf improvement (more 
than 50%) for these scenarios. See 
QueryDatabaseMetaDataTest.testCreateViewOnExistingTable() for an example of 
creating a mapped VIEW.

You may want to differentiate these two by modifying the RowProjector 
constructor. The place you'd need to change is ProjectionCompiler:361:         

return new RowProjector(projectedColumns, estimatedByteSize, 
isProjectEmptyKeyValue);

Maybe you can have pass through a boolean for isWildcard instead of the 
isProjectEmptyKeyValue. Then in ParallelIterators:94, instead of checking if 
(projector.isProjectEmptyKeyValue()), have three cases:
if (projector.isWildcard()) { // don't add your filter
} else if (table.getViewType == ViewType.MAPPED) { // add your filter, but set 
some flag to dynamically add an empty key value
} else { // do what we do now
}


> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
>                 Key: PHOENIX-29
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-29
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>         Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in 
> selecting any other column. The reason is that when you project a column into 
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the 
> column. The only case where this is not necessary is when the column is the 
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient 
> to just do a NEXT instead of a reseek (especially if your KV is the next 
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that 
> need to be returned back to the client which is another advantage we'd get 
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and 
> merge between them and the incoming KVs, using NEXT instead of a reseek. We 
> could potentially use a reseek if the number of columns in the table is 
> beyond a certain threshold.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to