[ 
https://issues.apache.org/jira/browse/DRILL-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253601#comment-17253601
 ] 

James Turton commented on DRILL-7825:
-------------------------------------

An afterthought is that in compressed Parquet columns the hex VARCHAR 
representations of UUIDs would be amenable to compression, while direct binary 
representations would not, so the effective space penalty should be well short 
of 225%.

> Error: SYSTEM ERROR: RuntimeException: Unknown logical type <LogicalType 
> UUID:UUIDType()>
> -----------------------------------------------------------------------------------------
>
>                 Key: DRILL-7825
>                 URL: https://issues.apache.org/jira/browse/DRILL-7825
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Storage - Parquet
>    Affects Versions: 1.17.0
>         Environment: Windows 10 single local node.
>            Reporter: ian
>            Assignee: Vitalii Diravka
>            Priority: Critical
>             Fix For: 1.19.0
>
>         Attachments: uuid.parquet
>
>
> Parquet logical type UUID fails on read.  Only workaround is to store as 
> text, a 225% penalty. 
> Here is the schema dump for the attached test parquet file.  I can read the 
> file okay from R and natively through C++.
> {code:java}
> 3961 $ parquet-dump-schema uuid.parquet
> required group field_id=0 schema {
>  required fixed_len_byte_array(16) field_id=1 uuid_req1 (UUID);
>  optional fixed_len_byte_array(16) field_id=2 uuid_opt1 (UUID);
>  required fixed_len_byte_array(16) field_id=3 uuid_req2 (UUID);
> }{code}
> I'm new.. I put this as MAJOR from reading the severity definitions, but 
> gladly defer to those who know better how to classify.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to