[
https://issues.apache.org/jira/browse/HIVE-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258865#comment-17258865
]
Mustafa İman commented on HIVE-24531:
-------------------------------------
This happens only when doing vectorized scan over a table which was stored as
TEXTFILE
> Vectorized table scan ignores binary column
> -------------------------------------------
>
> Key: HIVE-24531
> URL: https://issues.apache.org/jira/browse/HIVE-24531
> Project: Hive
> Issue Type: Bug
> Reporter: Mustafa İman
> Priority: Major
>
> There is a binary field in over1k dataset in hive codebase. Vectorized table
> scan ignores binary field and passes as null in all rows. The issue affects
> insert queries too with external tables and managed tables when
> "hive.stats.autogather=false".
> To reproduce:
> Add "set hive.stats.autogather=false;" on top of "vector_data_types.q"
> Run mvn test -Dtest=TestMiniLlapLocalCliDriver -Dqfile=vector_data_types.q"
> Observe that "bin" column is all NULL when querying any of the tables.
>
> Below is a simplified version of the same test:
> {code:java}
> set hive.mapred.mode=nonstrict;
> set hive.explain.user=false;
> set hive.fetch.task.conversion=none;
> set hive.stats.autogather=false;
> DROP TABLE over1k_n8;
> DROP TABLE over1korc_n1;
> -- data setup
> CREATE TABLE over1k_n8(t tinyint,
> si smallint,
> i int,
> b bigint,
> f float,
> d double,
> bo boolean,
> s string,
> ts timestamp,
> `dec` decimal(4,2),
> bin binary)
> ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
> STORED AS TEXTFILE;
> LOAD DATA LOCAL INPATH '../../data/files/over1k' OVERWRITE INTO TABLE
> over1k_n8;
> analyze table over1k_n8 compute statistics;
> analyze table over1k_n8 compute statistics for columns;
> select * from over1k_n8 limit 10;
> select count(1) from over1k_n8 where bin is null;
> CREATE TABLE over1korc_n1(t tinyint,
> si smallint,
> i int,
> b bigint,
> f float,
> d double,
> bo boolean,
> s string,
> ts timestamp,
> `dec` decimal(4,2),
> bin binary)
> STORED AS ORC;
> explain vectorization detail
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> INSERT INTO TABLE over1korc_n1 SELECT * FROM over1k_n8;
> select count(1) from over1korc_n1 where bin is null;
> select * from over1korc_n1 limit 10;
> {code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)