[jira] [Commented] (ARROW-11629) [C++] Writing float32 values makes parquet files not readable for some tools

Micah Kornfield (Jira) Mon, 15 Feb 2021 09:49:18 -0800


    [ 
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17284885#comment-17284885
 ]


Micah Kornfield commented on ARROW-11629:
-----------------------------------------

Would you mind sharing how the parquet file is written.

 

Loading the provided parquet file loads successfully, but isn't equal to the 
CSV data (when I load it as below).  When I try round tripping the data myself 
it appears to work though:

 

{{from_csv=pyarrow.csv.read_csv("output.csv", 
convert_options=pyarrow.csv.ConvertOptions(column_types=\{"I_Injection_IA": 
pyarrow.float32(), "InjectionRate": pyarrow.float32() }))}}

{{parquet.write_table(from_csv ,'foo.parquet')}}

{{from_parquet = parquet.read_table('foo.parquet')}}

{{from_parquet == from_csv # Yields True}}

 

> [C++] Writing float32 values makes parquet files not readable for some tools
> ----------------------------------------------------------------------------
>
>                 Key: ARROW-11629
>                 URL: https://issues.apache.org/jira/browse/ARROW-11629
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: C++, Python
>    Affects Versions: 3.0.0
>            Reporter: Matthias Rosenthaler
>            Priority: Major
>         Attachments: image-2021-02-15-15-49-41-908.png, output.csv, 
> output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64 
> columns to float32 and export it to parquet, the parquet file gets corrupted. 
> It is not readable for apache drill or Parquet.Net any longer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (ARROW-11629) [C++] Writing float32 values makes parquet files not readable for some tools

Reply via email to