[jira] [Commented] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

Micah Kornfield (Jira) Thu, 01 Apr 2021 08:50:04 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313258#comment-17313258
 ]


Micah Kornfield commented on ARROW-11629:
-----------------------------------------

[~matthros] I updated the gist to include all columns.  I tested (hopefully 
correctly) with parquet avro (which should use the  corresponding parquet-mr 
version) of 1.11.0, 1.11.1, 1.12.  All arrow files exactly match the Arrow data 
original parsed from the the CSV.

Given that parquet-dotnet is working on the latest version and Parquet MR seems 
to be able to faithfully read the data for prior versions.  I think the last 
remaining issue might exist someplace in Drill, but if it is OK with you, I 
think we should close this bug as it doesn't seem to be a problem with arrow 
(at least as of version 3.0).

> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files 
> not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: drill_query.csv, foo.parquet, 
> image-2021-02-15-15-49-41-908.png, output.csv, output.parquet, 
> parquet-dotnet.csv
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 
> columns to float32 and export it to parquet, the parquet file gets corrupted. 
> It is not readable for apache drill or Parquet.Net any longer.
>  
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for 
> float32 columns, everything works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

Reply via email to