James Taylor created PHOENIX-1940:
-------------------------------------
Summary: Push expected List<Cell> ordinal in
KeyValueColumnExpression
Key: PHOENIX-1940
URL: https://issues.apache.org/jira/browse/PHOENIX-1940
Project: Phoenix
Issue Type: Bug
Reporter: James Taylor
Looks like quite a bit of time is spent in the binary search done to get the
latest Cell value when we're evaluating expressions on the server side (up to
60% is spent in KeyValueUtil.getColumnLatest()). Since we know the set of
column qualifiers being projected into the scan, we could push the expected
position (assuming all columns have values). If the Cell is not in that
position, we could fall back to a binary search.
Further enhancements could be to: allow a not null constraint on KeyValue
columns and either a) require all non null values to be provided on an UPSERT,
or b) do a check and put to enforce it (for transactional tables this could be
enforced).
Additionally, the table could declare that dynamic columns are not allowed. If
both of the above are true, when we'd be guaranteed to be able to positionally
access the List<Cell> that we get back from an HBase Scanner.
One further enhancement would be to collect a set of all ColumnExpression
instances on the server side for all expressions sent over. Then, we'd bind
them once, outside of the general expression evaluation for each row. An
example of where this would save time would be in evaluating the following
TPCH-Q1 aggregate query:
{code}
SELECT
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
FROM
lineitem
WHERE
l_shipdate <= date '1998-12-01' - interval '90' day
GROUP BY
l_returnflag,
l_linestatus
ORDER BY
l_returnflag,
l_linestatus;
{code}
During aggregation, the KeyValueColumnExpression for l_extendedprice would be
evaluated four times, once per occurrence in different SELECT expressions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)