[
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285894#comment-17285894
]
Joris Van den Bossche commented on ARROW-11629:
-----------------------------------------------
They mention that it might be related to dictionary encoding, which is also
what [~emkornfield] noted above as a difference with fastparquet (that arrow
tries dictionary encoding first for float columns)
Could you try if disabling dictionary encoding solves the issue? (I _think_ you
can disable it in pyarrow with {{use_dictionary=False}})
> [C++] Writing float32 values makes parquet files not readable for some tools
> ----------------------------------------------------------------------------
>
> Key: ARROW-11629
> URL: https://issues.apache.org/jira/browse/ARROW-11629
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 3.0.0
> Reporter: Matthias Rosenthaler
> Priority: Major
> Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png,
> output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64
> columns to float32 and export it to parquet, the parquet file gets corrupted.
> It is not readable for apache drill or Parquet.Net any longer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)