[
https://issues.apache.org/jira/browse/PHOENIX-29?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13909480#comment-13909480
]
James Taylor commented on PHOENIX-29:
-------------------------------------
Nice, [~anoop.hbase]! I like the way you're tracking cq:cf used in the WHERE
clause, as there's a further optimization we can do for mapped views in a
follow up change. Looks like you don't even need to use the
RowProjector.isEmptyKeyValueProjected()? One small change is needed, as the
empty column family name depends on the schema, so we can't assume it's always
the default one. So you need to pass this family name through the constructor
and serialize/deserialize it, like this:
{code}
ScanUtil.andFilterAtEnd(scan, new
ColumnProjectionFilter(columnsTracker,
SchemaUtil.getEmptyColumnFamily(tableRef.getTable())));
{code}
and then update this check to use the member variable in ColumnProjectionFilter
instead of using QueryConstants.DEFAULT_COLUMN_FAMILY_BYTES:
{code}
kvs.add(new KeyValue(rk, emptyColumnFamily,
QueryConstants.EMPTY_COLUMN_BYTES));
{code}
> Add custom filter to more efficiently navigate KeyValues in row
> ---------------------------------------------------------------
>
> Key: PHOENIX-29
> URL: https://issues.apache.org/jira/browse/PHOENIX-29
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
> Attachments: PHOENIX-29.patch, PHOENIX-29_V2.patch,
> PHOENIX-29_v3.patch
>
>
> Currently HBase is 50% faster at selecting the first KV in a row than in
> selecting any other column. The reason is that when you project a column into
> a Scan, HBase uses its ExplicitColumTracker which does a reseek to the
> column. The only case where this is not necessary is when the column is the
> first one.
> In most cases (unless you have thousands of versions), it'd be more efficient
> to just do a NEXT instead of a reseek (especially if your KV is the next
> one). We can provide our own custom filter through which we pass two lists:
> 1) all KVs referenced in the select expressions. These are the only ones that
> need to be returned back to the client which is another advantage we'd get
> writing this custom filter.
> 2) all KVs referenced in the WHERE clause.
> The filter could sort the KVs using the standard KeyValue.COMPARATOR and
> merge between them and the incoming KVs, using NEXT instead of a reseek. We
> could potentially use a reseek if the number of columns in the table is
> beyond a certain threshold.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)