Kari Schoonbee created ARROW-11257:
--------------------------------------
Summary: PyArrow Table contains different data after writing and
reloading from Parquet
Key: ARROW-11257
URL: https://issues.apache.org/jira/browse/ARROW-11257
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 2.0.0
Reporter: Kari Schoonbee
Attachments: anonymised.jsonl, pyarrow_parquet_issue.ipynb
* I'm loading a JSONlines object into a table using
{code:java}
pa.json.readjson{code}
It contains one column that is a nested dictionary.
* I select a row by key and inspect its nested dictionary.
* I write the table to parquet
* I load the table again from the parquet file
* I check the same key and the nested dictionary is not the same.
To reproduce:
Find the attached JSONLines file and Jupyter Notebook.
The json file contains entries per customer with a generated `msisdn`,
`scoring_request_id` and `scorecard_result` object. Each `scorecard result
consists of a list of feature objects, all with the value the same as the
msidn` and a score.
The notebook reads the file and demonstrates the issue.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)