[ 
https://issues.apache.org/jira/browse/DRILL-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13934560#comment-13934560
 ] 

Jason Altekruse commented on DRILL-418:
---------------------------------------

I tried reading the file and saw that the varbianry field is using the 
dictionary encoding. We were planning on saving this part of the parquet spec 
for a little later when we were able to work in the full optimizer, as there is 
an efficient way to add support using an optimizer rule rather than a bit of 
duplicated work in the reader. That being said, this has repeatedly been a 
problem, and there seemingly no way to explicitly disable this feature in 
impala, you just have to write data that exceeds the capacity of the dictionary 
encoding (something like 50,00 or more unique strings). I am going to look and 
see if I cannot get this solved with our current optimizer, as some of the work 
can be re-used as we get the optiq optimizer integrated in the coming weeks.

> Reading from parquet file with optional fields hangs
> ----------------------------------------------------
>
>                 Key: DRILL-418
>                 URL: https://issues.apache.org/jira/browse/DRILL-418
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Ramana Inukonda Nagaraj
>         Attachments: customer.impala.parquet.tar.gz, hangLog
>
>
> Schema of file:
> message m {
>   optional int64 cust_key;
>   optional binary name;
>   optional binary address;
>   optional int32 nation_key;
>   optional binary phone;
>   optional double acctbal;
>   optional binary mktsegment;
>   optional binary comment_col;
> }
> Changing the optional fields to required fields results in a successful read. 
> Please find logs attached.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to