[ 
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17285350#comment-17285350
 ] 

Micah Kornfield commented on ARROW-11629:
-----------------------------------------

There are two differences with fast parquet:

1.  It appears the placement of column metadata is different (fileoffset with 
arrow seems to place all column metadata together and fastparquet seems to 
place it at the beginning of the column).

2. It looks like Arrow Parquet, first tries dictionary encoding the float 
columns (it does not appear to do this for doubles).

> [C++] Writing float32 values makes parquet files not readable for some tools
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png, 
> output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 
> columns to float32 and export it to parquet, the parquet file gets corrupted. 
> It is not readable for apache drill or Parquet.Net any longer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to