[jira] [Comment Edited] (PHOENIX-1940) Push expected List ordinal position in KeyValueColumnExpression

Lars Hofhansl (JIRA) Wed, 29 Apr 2015 16:24:13 -0700

    [ 
https://issues.apache.org/jira/browse/PHOENIX-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14520502#comment-14520502
 ]


Lars Hofhansl edited comment on PHOENIX-1940 at 4/29/15 11:22 PM:
------------------------------------------------------------------

And imported with this: {{psql.py -t LINEITEM -d '|' localhost lineitem.csv}} 
after ungzipping and renaming the data file to csv.


was (Author: lhofhansl):
And imported with this: {{psql.py -t LINEITEM -d '|' phoenix-1:2181 
lineitem.csv}} after ungzipping and renaming the data file to csv.

> Push expected List<Cell> ordinal position in KeyValueColumnExpression
> ---------------------------------------------------------------------
>
>                 Key: PHOENIX-1940
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-1940
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: James Taylor
>
> Looks like quite a bit of time is spent in the binary search done to get the 
> latest Cell value when we're evaluating expressions on the server side (up to 
> 60% is spent in KeyValueUtil.getColumnLatest()). Since we know the set of 
> column qualifiers being projected into the scan, we could push the expected 
> position (assuming all columns have values). If the Cell is not in that 
> position, we could fall back to a binary search.
> Further enhancements could be to: allow a not null constraint on KeyValue 
> columns and either a) require all non null values to be provided on an 
> UPSERT, or b) do a check and put to enforce it (for transactional tables this 
> could be enforced).
> Additionally, the table could declare that dynamic columns are not allowed. 
> If both of the above are true, then we'd be able guaranteed positional access 
> the List<Cell> that we get back from an HBase Scanner.
> One further enhancement would be to collect a set of all ColumnExpression 
> instances on the server side for all expressions sent over. Then, we'd bind 
> them once, outside of the general expression evaluation of all expressions in 
> a statement for a given row. An example of where this would save time would 
> be in evaluating the following TPCH-Q1 aggregate query:
> {code}
> SELECT
>     l_returnflag,
>     l_linestatus,
>     sum(l_quantity) as sum_qty,
>     sum(l_extendedprice) as sum_base_price,
>     sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
>     sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
>     avg(l_quantity) as avg_qty,
>     avg(l_extendedprice) as avg_price,
>     avg(l_discount) as avg_disc,
>     count(*) as count_order
> FROM
>     lineitem
> WHERE
>     l_shipdate <= date '1998-12-01' - interval '90' day
> GROUP BY
>     l_returnflag,
>     l_linestatus
> ORDER BY
>     l_returnflag,
>     l_linestatus;
> {code}
> During aggregation, the KeyValueColumnExpression for l_extendedprice would be 
> evaluated four times currently, once per occurrence in different SELECT 
> expressions. This enhancement would cut that down to once.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (PHOENIX-1940) Push expected List ordinal position in KeyValueColumnExpression

Reply via email to