Farzad Abdolhosseini created ARROW-8868:
-------------------------------------------
Summary: [Python] Feather format cannot store/retrieve lists
correctly?
Key: ARROW-8868
URL: https://issues.apache.org/jira/browse/ARROW-8868
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.17.1
Environment: Python 3.8.2
PyArrow 0.17.1
Pandas 1.0.3
Linux (Manjaro)
Reporter: Farzad Abdolhosseini
I'm seeing a very weird behavior when I try to store and retrieve a Pandas
data-frame using the Feather format. Simplified example:
{code:python}
>>> import pandas as pd
>>> df = pd.DataFrame(data={"scalar": [1, 2], "array": [[1], [7]]})
>>> df
scalar array
0 1 [1]
1 2 [7]
>>> df.to_feather("test.ft")
>>> pd.read_feather("test.ft")
scalar array
0 1 [16]
1 2 [1045468844972122628]
{code}
As you can see, the retrieved data is incorrect. I was originally trying to use
the `feather-format` (not using Pandas directly) and that didn't work well
either.
By playing around with the data-frame that is to be stored I can also get
different but still incorrect behavior, e.g. a larger list, an error that says
the file size is incorrect, or simply a segmentation fault.
This is my first time using Feather/Arrow BTW.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)