[
https://issues.apache.org/jira/browse/HIVE-18422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16330611#comment-16330611
]
Matt McCline commented on HIVE-18422:
-------------------------------------
I'm looking at it – it doesn't seem quite right.
There is a precedence/order in evaluating the 3 variable that control
vectorization of input formats.
1. hive.vectorized.use.vectorized.input.format
2. hive.vectorized.use.vector.serde.deserialize
3. hive.vectorized.use.row.serde.deserialize
If #1 is true and the input format is assignable from
VectorizedInputFormatInterface, then we vectorize.
Otherwise, look at #2. If #2 is true and input format is TextInputFormat or
SequenceFileInputFormat, then vectorize using vector serde.
Finally, look #3. If #3 is true and input format is not excluded, then
vectorize using row serde.
So, it seems like what is missing in the repro steps issetting
hive.vectorized.use.vectorized.input.format to false. In that way, if the
customer does have hive.vectorized.use.vectorized.input.format true, then it
will ignore the exclude since that only applies to vectorizing row serde and it
will vectorized the vertex.
> Vectorized input format should not be used when input format is excluded and
> row.serde is enabled
> -------------------------------------------------------------------------------------------------
>
> Key: HIVE-18422
> URL: https://issues.apache.org/jira/browse/HIVE-18422
> Project: Hive
> Issue Type: Bug
> Components: Vectorization
> Affects Versions: 3.0.0, 2.4.0
> Reporter: Vihang Karajgaonkar
> Assignee: Vihang Karajgaonkar
> Priority: Minor
> Attachments: HIVE-18422.01.patch, HIVE-18422.02.patch
>
>
> HIVE-17534 introduced a config which gives a capability to exclude certain
> inputformat from vectorized execution without affecting other input formats.
> If an input format is excluded and row.serde is enabled at the same time,
> vectorizer still sets the {{useVectorizedInputFormat}} to true which causes
> Vectorized readers to be used in row.serde mode.
> In order to reproduce:
> {noformat}
> set hive.fetch.task.conversion=none;
> set hive.vectorized.use.row.serde.deserialize=true;
> set hive.vectorized.use.vector.serde.deserialize=true;
> set hive.vectorized.execution.enabled=true;
> set hive.vectorized.execution.reduce.enabled=true;
> set hive.vectorized.row.serde.inputformat.excludes=;
> -- SORT_QUERY_RESULTS
> -- exclude MapredParquetInputFormat from vectorization, this should cause
> mapwork vectorization to be disabled
> set
> hive.vectorized.input.format.excludes=org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat,org.apache.hadoop.hive.ql.io.orc.OrcInputFormat;
> set hive.vectorized.use.vectorized.input.format=true;
> create table orcTbl (t1 tinyint, t2 tinyint)
> stored as orc;
> insert into orcTbl values (54, 9), (-104, 25), (-112, 24);
> explain vectorization select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> select t1, t2, (t1+t2) from orcTbl where (t1+t2) > 10;
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)