[ 
https://issues.apache.org/jira/browse/HIVE-15516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt McCline resolved HIVE-15516.
---------------------------------
    Resolution: Duplicate

HIVE-15588

> Unable to vectorize select statement having case-when with 
> GenericUDFOPGreaterThan expr
> ---------------------------------------------------------------------------------------
>
>                 Key: HIVE-15516
>                 URL: https://issues.apache.org/jira/browse/HIVE-15516
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>
> First query listed below does not get vectorized; Without "case-when" 
> statement it gets vectorized.
> {noformat}
> hive> explain select sum(case when ss_quantity > 1 then ss_quantity * 
> ss_wholesale_cost else 0 end) from store_sales;
> explain select sum(case when ss_quantity > 1 then ss_quantity * 
> ss_wholesale_cost else 0 end) from store_sales
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: rbalamohan_20161227045137_c7a736c6-1812-4c8f-974e-7f7fcc7b1513:28
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName:
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   Statistics: Num rows: 28800426268 Data size: 330048503520 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: CASE WHEN ((ss_quantity > 1)) THEN 
> ((UDFToDouble(ss_quantity) * ss_wholesale_cost)) ELSE (0) END (type: double)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 28800426268 Data size: 330048503520 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Group By Operator
>                       aggregations: sum(_col0)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         value expressions: _col0 (type: double)
>             Execution mode: llap
>             LLAP IO: all inputs
>         Reducer 2
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> ....
> ....
> 2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] 
> physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
> 2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] 
> physical.Vectorizer: Unable to use the VectorUDFAdaptor. Encountered 
> unsupported expr desc : GenericUDFOPGreaterThan(Column[ss_quantity], Const 
> int 1)
> 2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] 
> physical.Vectorizer: Cannot vectorize select expression: 
> GenericUDFWhen(GenericUDFOPGreaterThan(Column[ss_quantity], Const int 1), 
> GenericUDFOPMultiply(GenericUDFBridge ==> UDFToDouble (Column[ss_quantity]), 
> Column[ss_wholesale_cost]), Const int 0)
> 2016-12-27T04:53:20,507  INFO [16185d97-97f4-477e-9436-4d2b98add389 main] 
> physical.Vectorizer: MapWork Operator: SEL could not be vectorized.
> ....
> ....
> hive> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales;
> explain select sum(ss_quantity * ss_wholesale_cost) from store_sales
> OK
> STAGE DEPENDENCIES:
>   Stage-1 is a root stage
>   Stage-0 depends on stages: Stage-1
> STAGE PLANS:
>   Stage: Stage-1
>     Tez
>       DagId: rbalamohan_20161227045112_8311df89-31fb-47ee-ad70-f702a85527cc:27
>       Edges:
>         Reducer 2 <- Map 1 (SIMPLE_EDGE)
>       DagName:
>       Vertices:
>         Map 1
>             Map Operator Tree:
>                 TableScan
>                   alias: store_sales
>                   Statistics: Num rows: 28800426268 Data size: 330048503520 
> Basic stats: COMPLETE Column stats: COMPLETE
>                   Select Operator
>                     expressions: (UDFToDouble(ss_quantity) * 
> ss_wholesale_cost) (type: double)
>                     outputColumnNames: _col0
>                     Statistics: Num rows: 28800426268 Data size: 330048503520 
> Basic stats: COMPLETE Column stats: COMPLETE
>                     Group By Operator
>                       aggregations: sum(_col0)
>                       mode: hash
>                       outputColumnNames: _col0
>                       Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                       Reduce Output Operator
>                         sort order:
>                         Statistics: Num rows: 1 Data size: 8 Basic stats: 
> COMPLETE Column stats: COMPLETE
>                         value expressions: _col0 (type: double)
>             Execution mode: vectorized, llap
>             LLAP IO: all inputs
>         Reducer 2
>             Execution mode: vectorized, llap
>             Reduce Operator Tree:
>               Group By Operator
>                 aggregations: sum(VALUE._col0)
>                 mode: mergepartial
>                 outputColumnNames: _col0
>                 Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                 File Output Operator
>                   compressed: false
>                   Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE 
> Column stats: COMPLETE
>                   table:
>                       input format: 
> org.apache.hadoop.mapred.SequenceFileInputFormat
>                       output format: 
> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>                       serde: 
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
>       Processor Tree:
>         ListSink
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to