[
https://issues.apache.org/jira/browse/PARQUET-816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney updated PARQUET-816:
---------------------------------
Attachment: nation.dict.parquet
> [C++] Failure decoding sample dict-encoded file from parquet-compatibility
> project
> ----------------------------------------------------------------------------------
>
> Key: PARQUET-816
> URL: https://issues.apache.org/jira/browse/PARQUET-816
> Project: Parquet
> Issue Type: Bug
> Components: parquet-cpp
> Reporter: Wes McKinney
> Attachments: nation.dict.parquet
>
>
> See attached. This throws an exception when read:
> {code}
> $ debug/parquet_reader nation.dict.parquet
> File statistics:
> Version: 1
> Created By: parquet-mr
> Total rows: 25
> Number of RowGroups: 1
> Number of Real Columns: 4
> Number of Columns: 4
> Number of Selected Columns: 4
> Column 0: nation_key (INT32)
> Column 1: name (BYTE_ARRAY)
> Column 2: region_key (INT32)
> Column 3: comment_col (BYTE_ARRAY)
> --- Row Group 0 ---
> --- Total Bytes 0 ---
> rows: 25---
> Column 0
> , values: 25 Statistics Not Set
> compression: UNCOMPRESSED, encodings:
> uncompressed size: 125, compressed size: 125
> Column 1
> , values: 25 Statistics Not Set
> compression: UNCOMPRESSED, encodings:
> uncompressed size: 322, compressed size: 322
> Column 2
> , values: 25 Statistics Not Set
> compression: UNCOMPRESSED, encodings:
> uncompressed size: 125, compressed size: 125
> Column 3
> , values: 25 Statistics Not Set
> compression: UNCOMPRESSED, encodings:
> uncompressed size: 2002, compressed size: 2002
> nation_key name region_key
> comment_col
> 0 Parquet error: Unexpected end of stream.
> {code}
> However, I checked that I can read this file with Impala:
> {code}
> In [13]: hdfs.put('/tmp/nation-dict-test/test.parq', 'nation.dict.parquet')
> Out[13]: '/tmp/nation-dict-test/test.parq'
> In [14]: pf = con.parquet_file('/tmp/nation-dict-test')
> In [15]: pf.execute()
> Out[15]:
> nation_key name region_key \
> 0 0 ALGERIA 0
> 1 1 ARGENTINA 1
> 2 2 BRAZIL 1
> 3 3 CANADA 1
> 4 4 EGYPT 4
> 5 5 ETHIOPIA 0
> 6 6 FRANCE 3
> 7 7 GERMANY 3
> 8 8 INDIA 2
> 9 9 INDONESIA 2
> 10 10 IRAN 4
> 11 11 IRAQ 4
> 12 12 JAPAN 2
> 13 13 JORDAN 4
> 14 14 KENYA 0
> 15 15 MOROCCO 0
> 16 16 MOZAMBIQUE 0
> 17 17 PERU 1
> 18 18 CHINA 2
> 19 19 ROMANIA 3
> 20 20 SAUDI ARABIA 4
> 21 21 VIETNAM 2
> 22 22 RUSSIA 3
> 23 23 UNITED KINGDOM 3
> 24 24 UNITED STATES 1
> comment_col
> 0 haggle. carefully final deposits detect slyly...
> 1 al foxes promise slyly according to the regula...
> 2 y alongside of the pending deposits. carefully...
> 3 eas hang ironic, silent packages. slyly regula...
> 4 y above the carefully unusual theodolites. fin...
> 5 ven packages wake quickly. regu
> 6 refully final requests. regular, ironi
> 7 l platelets. regular accounts x-ray: unusual, ...
> 8 ss excuses cajole slyly across the packages. d...
> 9 slyly express asymptotes. regular deposits ha...
> 10 efully alongside of the slyly final dependenci...
> 11 nic deposits boost atop the quickly final requ...
> 12 ously. final, express gifts cajole a
> 13 ic deposits are blithely about the carefully r...
> 14 pending excuses haggle furiously deposits. pe...
> 15 rns. blithely bold courts among the closely re...
> 16 s. ironic, unusual asymptotes wake blithely r
> 17 platelets. blithely pending dependencies use f...
> 18 c dependencies. furiously express notornis sle...
> 19 ular asymptotes are about the furious multipli...
> 20 ts. silent requests haggle. closely express pa...
> 21 hely enticingly express accounts. even, final
> 22 requests against the platelets use never acco...
> 23 eans boost carefully special requests. account...
> 24 y final packages. slow foxes cajole quickly. q...
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)