[
https://issues.apache.org/jira/browse/DRILL-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253653#comment-17253653
]
ian commented on DRILL-7825:
----------------------------
-rw-r--r-- 1 *** None {color:#FF0000}83548516{color} Dec 22 11:45
uuid-string.parquet
-rw-r--r-- 1 *** None {color:#FF0000}39575254{color} Dec 22 11:45 uuid.parquet
>From my simplistic test, penalty is about 111%. My mistake.. above,
>theoretical penalty would be 125% , not 225%. I think this outcomes results
>from the random nature of UUIDs. Because they are pseudo-random, the
>resulting strings don't compressed very well, probably only slightly better
>than the binary.
Good thought, but probably not a practical work around for any needs at scale.
Thanks again and best.
> Error: SYSTEM ERROR: RuntimeException: Unknown logical type <LogicalType
> UUID:UUIDType()>
> -----------------------------------------------------------------------------------------
>
> Key: DRILL-7825
> URL: https://issues.apache.org/jira/browse/DRILL-7825
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.17.0
> Environment: Windows 10 single local node.
> Reporter: ian
> Assignee: Vitalii Diravka
> Priority: Critical
> Fix For: 1.19.0
>
> Attachments: uuid.parquet
>
>
> Parquet logical type UUID fails on read. Only workaround is to store as
> text, a 125% penalty.
> Here is the schema dump for the attached test parquet file. I can read the
> file okay from R and natively through C++.
> {code:java}
> 3961 $ parquet-dump-schema uuid.parquet
> required group field_id=0 schema {
> required fixed_len_byte_array(16) field_id=1 uuid_req1 (UUID);
> optional fixed_len_byte_array(16) field_id=2 uuid_opt1 (UUID);
> required fixed_len_byte_array(16) field_id=3 uuid_req2 (UUID);
> }{code}
> I'm new.. I put this as MAJOR from reading the severity definitions, but
> gladly defer to those who know better how to classify.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)