[
https://issues.apache.org/jira/browse/HIVE-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303858#comment-15303858
]
Matt McCline commented on HIVE-13872:
-------------------------------------
LazySimpleDeserializeRead.readCheckNull does use a columnsToInclude feature to
skip and not decode the contents of unwanted fields. The parse does scan for
the field separators. Also, VectorMapOperator attempts to truncate the
included column array on the right (not sure if this is working).
> Vectorization: Fix cross-product reduce sink serialization
> ----------------------------------------------------------
>
> Key: HIVE-13872
> URL: https://issues.apache.org/jira/browse/HIVE-13872
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 2.1.0
> Reporter: Gopal V
> Attachments: HIVE-13872.WIP.patch
>
>
> TPC-DS Q13 produces a cross-product without CBO simplifying the query
> {code}
> Caused by: java.lang.RuntimeException: null STRING entry: batchIndex 0
> projection column num 1
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.nullBytesReadError(VectorExtractRow.java:349)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRowColumn(VectorExtractRow.java:267)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorExtractRow.extractRow(VectorExtractRow.java:343)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorReduceSinkOperator.process(VectorReduceSinkOperator.java:103)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:837)
> at
> org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:130)
> at
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:762)
> ... 18 more
> {code}
> Simplified query
> {code}
> set hive.cbo.enable=false;
> -- explain
> select count(1)
> from store_sales
> ,customer_demographics
> where (
> (
> customer_demographics.cd_demo_sk = store_sales.ss_cdemo_sk
> and customer_demographics.cd_marital_status = 'M'
> )or
> (
> customer_demographics.cd_demo_sk = ss_cdemo_sk
> and customer_demographics.cd_marital_status = 'U'
> ))
> ;
> {code}
> {code}
> Map 3
> Map Operator Tree:
> TableScan
> alias: customer_demographics
> Statistics: Num rows: 1920800 Data size: 717255532 Basic
> stats: COMPLETE Column stats: NONE
> Reduce Output Operator
> sort order:
> Statistics: Num rows: 1920800 Data size: 717255532 Basic
> stats: COMPLETE Column stats: NONE
> value expressions: cd_demo_sk (type: int),
> cd_marital_status (type: string)
> Execution mode: vectorized, llap
> LLAP IO: all inputs
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)