[
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285350#comment-17285350
]
Micah Kornfield edited comment on ARROW-11629 at 2/16/21, 5:10 PM:
-------------------------------------------------------------------
There are two differences with fast parquet:
1. It appears the placement of column metadata is different (fileoffset with
arrow seems to place all column metadata together and fastparquet seems to
place it at the beginning of the column).
2. It looks like Arrow Parquet, first tries dictionary encoding the float
columns (it also looks like it does this for doubles as well though.).
was (Author: emkornfield):
There are two differences with fast parquet:
1. It appears the placement of column metadata is different (fileoffset with
arrow seems to place all column metadata together and fastparquet seems to
place it at the beginning of the column).
2. It looks like Arrow Parquet, first tries dictionary encoding the float
columns (it does not appear to do this for doubles).
> [C++] Writing float32 values makes parquet files not readable for some tools
> ----------------------------------------------------------------------------
>
> Key: ARROW-11629
> URL: https://issues.apache.org/jira/browse/ARROW-11629
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 3.0.0
> Reporter: Matthias Rosenthaler
> Priority: Major
> Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png,
> output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64
> columns to float32 and export it to parquet, the parquet file gets corrupted.
> It is not readable for apache drill or Parquet.Net any longer.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)