[
https://issues.apache.org/jira/browse/ARROW-11069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258175#comment-17258175
]
Joris Van den Bossche commented on ARROW-11069:
-----------------------------------------------
I included a simplified version of this case in a test added in
https://github.com/apache/arrow/pull/9091
> [C++] Parquet writer incorrect data being written when data type is struct
> --------------------------------------------------------------------------
>
> Key: ARROW-11069
> URL: https://issues.apache.org/jira/browse/ARROW-11069
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 2.0.0
> Environment: pandas v1.0.4
> Reporter: Palash Goel
> Priority: Major
> Fix For: 3.0.0
>
> Attachments: first_write.parquet, image-2020-12-30-01-19-20-491.png,
> image-2020-12-30-01-19-42-739.png, image-2020-12-30-01-20-45-183.png,
> original.parquet
>
>
> When writing a dict column using pyarrow.
>
> {code:python}
> import pandas as pd
> orig = pd.read_parquet("original.parquet")
> orig.to_parquet("first_write.parquet")
> first_write = pd.read_parquet("first_write.parquet")
> print(orig.equals(first_write))
> {code}
>
> This incorrect results start appearing after index 1024. first_write.parquet
> was created after reading and then writing it again. I don't see any obvious
> pattern in the shuffled rows.
> !image-2020-12-30-01-20-45-183.png!
> Original records
> !image-2020-12-30-01-19-20-491.png!
> Written records
--
This message was sent by Atlassian Jira
(v8.3.4#803005)