[jira] [Updated] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

Matthias Rosenthaler (Jira) Mon, 01 Mar 2021 06:08:05 -0800


     [ 
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Matthias Rosenthaler updated ARROW-11629:
-----------------------------------------
    Description: 
If I try to read the attached csv file with pyarrow, changing the float64 
columns to float32 and export it to parquet, the parquet file gets corrupted. 
It is not readable for apache drill or Parquet.Net any longer.

 

Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for float32 
columns, everything works as expected.

  was:
If I try to read the attached csv file with pyarrow, changing the float64 
columns to float32 and export it to parquet, the parquet file gets corrupted. 
It is not readable for apache drill or Parquet.Net any longer.

 

Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for floats, 
everything works as expected.


> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files 
> not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png, 
> output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 
> columns to float32 and export it to parquet, the parquet file gets corrupted. 
> It is not readable for apache drill or Parquet.Net any longer.
>  
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for 
> float32 columns, everything works as expected.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (ARROW-11629) [C++] Writing float32 values with "Dictionary Encoding" makes parquet files not readable for some tools

Reply via email to