[
https://issues.apache.org/jira/browse/DRILL-4048?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jacques Nadeau resolved DRILL-4048.
-----------------------------------
Resolution: Fixed
Fixed in a5a1aa6
> Parquet reader corrupts dictionary encoded binary columns
> ---------------------------------------------------------
>
> Key: DRILL-4048
> URL: https://issues.apache.org/jira/browse/DRILL-4048
> Project: Apache Drill
> Issue Type: Bug
> Components: Storage - Parquet
> Affects Versions: 1.3.0
> Reporter: Rahul Challapalli
> Assignee: Jason Altekruse
> Priority: Blocker
> Attachments: lineitem_dic_enc.parquet
>
>
> git.commit.id.abbrev=04c01bd
> The below query returns corrupted data (not even showing up here) for binary
> columns
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity |
> l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus |
> l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct |
> l_shipmode | l_comment |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> | 1 | 1552 | 93 | 1 | 17.0 |
> 24710.35 | 0.04 | 0.02 | | |
> 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PE | T |
> egular courts above the |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> {code}
> The same query from an older build (git.commit.id.abbrev=839f8da)
> {code}
> select * from `lineitem_dic_enc.parquet` limit 1;
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> | l_orderkey | l_partkey | l_suppkey | l_linenumber | l_quantity |
> l_extendedprice | l_discount | l_tax | l_returnflag | l_linestatus |
> l_shipdate | l_commitdate | l_receiptdate | l_shipinstruct |
> l_shipmode | l_comment |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> | 1 | 1552 | 93 | 1 | 17.0 |
> 24710.35 | 0.04 | 0.02 | N | O |
> 1996-03-13 | 1996-02-12 | 1996-03-22 | DELIVER IN PERSON | TRUCK
> | egular courts above the |
> +-------------+------------+------------+---------------+-------------+------------------+-------------+--------+---------------+---------------+-------------+---------------+----------------+--------------------+-------------+--------------------------+
> {code}
> Below is the output of the parquet-meta command for this dataset
> {code}
> creator: parquet-mr
> file schema: root
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l_orderkey: REQUIRED INT32 R:0 D:0
> l_partkey: REQUIRED INT32 R:0 D:0
> l_suppkey: REQUIRED INT32 R:0 D:0
> l_linenumber: REQUIRED INT32 R:0 D:0
> l_quantity: REQUIRED DOUBLE R:0 D:0
> l_extendedprice: REQUIRED DOUBLE R:0 D:0
> l_discount: REQUIRED DOUBLE R:0 D:0
> l_tax: REQUIRED DOUBLE R:0 D:0
> l_returnflag: REQUIRED BINARY O:UTF8 R:0 D:0
> l_linestatus: REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipdate: REQUIRED INT32 O:DATE R:0 D:0
> l_commitdate: REQUIRED INT32 O:DATE R:0 D:0
> l_receiptdate: REQUIRED INT32 O:DATE R:0 D:0
> l_shipinstruct: REQUIRED BINARY O:UTF8 R:0 D:0
> l_shipmode: REQUIRED BINARY O:UTF8 R:0 D:0
> l_comment: REQUIRED BINARY O:UTF8 R:0 D:0
> row group 1: RC:60175 TS:3049610
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> l_orderkey: INT32 SNAPPY DO:0 FPO:4 SZ:146159/165487/1.13 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_partkey: INT32 SNAPPY DO:0 FPO:146163 SZ:90867/90918/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_suppkey: INT32 SNAPPY DO:0 FPO:237030 SZ:53244/53230/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_linenumber: INT32 SNAPPY DO:0 FPO:290274 SZ:14909/22767/1.53 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_quantity: DOUBLE SNAPPY DO:0 FPO:305183 SZ:45536/45715/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_extendedprice: DOUBLE SNAPPY DO:0 FPO:350719 SZ:327454/407907/1.25
> VC:60175 ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_discount: DOUBLE SNAPPY DO:0 FPO:678173 SZ:30349/30359/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_tax: DOUBLE SNAPPY DO:0 FPO:708522 SZ:30334/30342/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_returnflag: BINARY SNAPPY DO:0 FPO:738856 SZ:14700/14714/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_linestatus: BINARY SNAPPY DO:0 FPO:753556 SZ:8964/9506/1.06 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_shipdate: INT32 SNAPPY DO:0 FPO:762520 SZ:100537/100514/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_commitdate: INT32 SNAPPY DO:0 FPO:863057 SZ:100314/100282/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_receiptdate: INT32 SNAPPY DO:0 FPO:963371 SZ:100584/100558/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_shipinstruct: BINARY SNAPPY DO:0 FPO:1063955 SZ:15311/15303/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_shipmode: BINARY SNAPPY DO:0 FPO:1079266 SZ:22800/22797/1.00 VC:60175
> ENC:BIT_PACKED,PLAIN_DICTIONARY
> l_comment: BINARY SNAPPY DO:0 FPO:1102066 SZ:795339/1839211/2.31
> VC:60175 ENC:PLAIN,BIT_PACKED
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)