[ 
https://issues.apache.org/jira/browse/HIVE-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanth Jayachandran updated HIVE-13330:
-----------------------------------------
    Attachment: HIVE-13330.2.patch

> ORC vectorized string dictionary reader does not differentiate null vs empty 
> string dictionary
> ----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-13330
>                 URL: https://issues.apache.org/jira/browse/HIVE-13330
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.3.0, 2.0.0, 2.1.0
>            Reporter: Prasanth Jayachandran
>            Assignee: Prasanth Jayachandran
>            Priority: Critical
>         Attachments: HIVE-13330.1.patch, HIVE-13330.2.patch
>
>
> Vectorized string dictionary reader cannot differentiate between the case 
> where all dictionary entries are null vs single entry with empty string. This 
> causes wrong results when reading data out of such files. 
> {code:title=Vectorization On}
> SET hive.vectorized.execution.enabled=true;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> NULL
> {code}
> {code:title=Vectorization Off}
> SET hive.vectorized.execution.enabled=false;
> SET hive.fetch.task.conversion=none;
> select vcol from testnullorc3 limit 1;
> OK
> {code}
> The input table testnullorc3 contains a varchar column vcol with few empty 
> strings and few nulls. For this table, non vectorized reader returns empty as 
> first row but vectorized reader returns NULL. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to