[
https://issues.apache.org/jira/browse/ARROW-11629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17312926#comment-17312926
]
Micah Kornfield commented on ARROW-11629:
-----------------------------------------
[~matthros] Sorry I posed a lot of thoughts in a row so my communication might
have been unclear. I created java code using parque-mr
([gist|https://gist.github.com/emkornfield/efd3a4c3c1012dc19cf9769198e3bffe])
that parses the parquet file written by pyarrow.
the java code then reads through all the data and selects two columns to write
out in the Arrow format. When I read the arrow file produced from java back in
python the columns are identical. So it seems the latest version of
parquet-mr (java which I believe drill relies on) is able to read the files
produced by pyarrow. If there are other columns I should compare I can add
them (I compared the first column which appears to be row-number and one of the
float columns ('I_Injection_IA').
So my question is what do you mean by values are "displaced" in Drill? Was it
for 'I_Injection_IA' or other columns?
> [C++] Writing float32 values with "Dictionary Encoding" makes parquet files
> not readable for some tools
> -------------------------------------------------------------------------------------------------------
>
> Key: ARROW-11629
> URL: https://issues.apache.org/jira/browse/ARROW-11629
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 3.0.0
> Reporter: Matthias Rosenthaler
> Priority: Major
> Attachments: foo.parquet, image-2021-02-15-15-49-41-908.png,
> output.csv, output.parquet
>
>
> If I try to read the attached csv file with pyarrow, changing the float64
> columns to float32 and export it to parquet, the parquet file gets corrupted.
> It is not readable for apache drill or Parquet.Net any longer.
>
> Update: Bug in "*Dictionary Encoding*" feature. If I switch it off for
> float32 columns, everything works as expected.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)