[
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313258#comment-17313258
]
Micah Kornfield commented on ARROW-11629:
-----------------------------------------
[~matthros] I updated the gist to include all columns. I tested (hopefully
correctly) with parquet avro (which should use the corresponding parquet-mr
version) of 1.11.0, 1.11.1, 1.12. All arrow files exactly match the Arrow data
original parsed from the the CSV.
Given that parquet-dotnet is working on the latest version and Parquet MR seems
to be able to faithfully read the data for prior versions. I think the last
remaining issue might exist someplace in Drill, but if it is OK with you, I
think we should close this bug as it doesn't seem to be a problem with arrow
(at least as of version 3.0).
> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files
> not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
> Key: ARROW-11629
> URL: https://issues.apache.org/jira/browse/ARROW-11629
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 3.0.0
> Reporter: Matthias Rosenthaler
> Priority: Major
> Attachments: drill_query.csv, foo.parquet,
> image-2021-02-15-15-49-41-908.png, output.csv, output.parquet,
> parquet-dotnet.csv
>
>
> If I try to read the attached csv file with pyarrow, changing the float64
> columns to float32 and export it to parquet, the parquet file gets corrupted.
> It is not readable for apache drill or Parquet.Net any longer.
>
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for
> float32 columns, everything works as expected.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)