[
https://issues.apache.org/jira/browse/PHOENIX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Taylor updated PHOENIX-1940:
----------------------------------
Summary: Push expected List<Cell> ordinal position in
KeyValueColumnExpression (was: Push expected List<Cell> ordinal in
KeyValueColumnExpression)
> Push expected List<Cell> ordinal position in KeyValueColumnExpression
> ---------------------------------------------------------------------
>
> Key: PHOENIX-1940
> URL: https://issues.apache.org/jira/browse/PHOENIX-1940
> Project: Phoenix
> Issue Type: Bug
> Reporter: James Taylor
>
> Looks like quite a bit of time is spent in the binary search done to get the
> latest Cell value when we're evaluating expressions on the server side (up to
> 60% is spent in KeyValueUtil.getColumnLatest()). Since we know the set of
> column qualifiers being projected into the scan, we could push the expected
> position (assuming all columns have values). If the Cell is not in that
> position, we could fall back to a binary search.
> Further enhancements could be to: allow a not null constraint on KeyValue
> columns and either a) require all non null values to be provided on an
> UPSERT, or b) do a check and put to enforce it (for transactional tables this
> could be enforced).
> Additionally, the table could declare that dynamic columns are not allowed.
> If both of the above are true, when we'd be guaranteed to be able to
> positionally access the List<Cell> that we get back from an HBase Scanner.
> One further enhancement would be to collect a set of all ColumnExpression
> instances on the server side for all expressions sent over. Then, we'd bind
> them once, outside of the general expression evaluation for each row. An
> example of where this would save time would be in evaluating the following
> TPCH-Q1 aggregate query:
> {code}
> SELECT
> l_returnflag,
> l_linestatus,
> sum(l_quantity) as sum_qty,
> sum(l_extendedprice) as sum_base_price,
> sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
> sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
> avg(l_quantity) as avg_qty,
> avg(l_extendedprice) as avg_price,
> avg(l_discount) as avg_disc,
> count(*) as count_order
> FROM
> lineitem
> WHERE
> l_shipdate <= date '1998-12-01' - interval '90' day
> GROUP BY
> l_returnflag,
> l_linestatus
> ORDER BY
> l_returnflag,
> l_linestatus;
> {code}
> During aggregation, the KeyValueColumnExpression for l_extendedprice would be
> evaluated four times, once per occurrence in different SELECT expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)