[
https://issues.apache.org/jira/browse/HIVE-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matt McCline resolved HIVE-15516.
---------------------------------
Resolution: Duplicate
HIVE-15588
> Unable to vectorize select statement having case-when with
> GenericUDFOPGreaterThan expr
> ---------------------------------------------------------------------------------------
>
> Key: HIVE-15516
> URL: https://issues.apache.org/jira/browse/HIVE-15516
> Project: Hive
> Issue Type: Bug
> Reporter: Rajesh Balamohan
>
> First query listed below does not get vectorized; Without "case-when"
> statement it gets vectorized.
> {noformat}
> hive> explain select sum(case when ss_quantity > 1 then ss_quantity *
> ss_wholesale_cost else 0 end) from store_sales;
> explain select sum(case when ss_quantity > 1 then ss_quantity *
> ss_wholesale_cost else 0 end) from store_sales
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> Statistics: Num rows: 28800426268 Data size: 330048503520
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: CASE WHEN ((ss_quantity > 1)) THEN
> ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double)
> outputColumnNames: _col0
> Statistics: Num rows: 28800426268 Data size: 330048503520
> Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: sum(_col0)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: double)
> Execution mode: llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> ....
> ....
> 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main]
> physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
> 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main]
> physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered
> unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const
> int 1)
> 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main]
> physical.Vectorizer: Cannot vectorize select expression:
> GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1),
> GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]),
> Column[ss_wholesale_cost]), Const int 0)
> 2016-12-27T04:53:20,507 INFO [16185d97-97f4-477e-9436-4d2b98add389 main]
> physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
> ....
> ....
> hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales;
> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales
> OK
> STAGE DEPENDENCIES:
> Stage-1 is a root stage
> Stage-0 depends on stages: Stage-1
> STAGE PLANS:
> Stage: Stage-1
> Tez
> DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27
> Edges:
> Reducer 2 <- Map 1 (SIMPLE_EDGE)
> DagName:
> Vertices:
> Map 1
> Map Operator Tree:
> TableScan
> alias: store_sales
> Statistics: Num rows: 28800426268 Data size: 330048503520
> Basic stats: COMPLETE Column stats: COMPLETE
> Select Operator
> expressions: (UDFToDouble(ss_quantity) *
> ss_wholesale_cost) (type: double)
> outputColumnNames: _col0
> Statistics: Num rows: 28800426268 Data size: 330048503520
> Basic stats: COMPLETE Column stats: COMPLETE
> Group By Operator
> aggregations: sum(_col0)
> mode: hash
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1 Data size: 8 Basic stats:
> COMPLETE Column stats: COMPLETE
> value expressions: _col0 (type: double)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> Reducer 2
> Execution mode: vectorized, llap
> Reduce Operator Tree:
> Group By Operator
> aggregations: sum(VALUE._col0)
> mode: mergepartial
> outputColumnNames: _col0
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> File Output Operator
> compressed: false
> Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE
> Column stats: COMPLETE
> table:
> input format:
> org.apache.hadoop.mapred.SequenceFileInputFormat
> output format:
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
> serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
> Stage: Stage-0
> Fetch Operator
> limit: -1
> Processor Tree:
> ListSink
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)