Joris Van den Bossche created ARROW-10643:
---------------------------------------------

             Summary: [Python] Pandas<->pyarrow roundtrip failing to recreate 
index for empty dataframe
                 Key: ARROW-10643
                 URL: https://issues.apache.org/jira/browse/ARROW-10643
             Project: Apache Arrow
          Issue Type: New Feature
          Components: Python
            Reporter: Joris Van den Bossche


>From https://github.com/pandas-dev/pandas/issues/37897

The roundtrip of an empty pandas.DataFrame _with_ and index (so no columns, but 
a non-zero shape for the rows) isn't faithful:

{code}
In [33]: df = pd.DataFrame(index=pd.RangeIndex(0, 10, 1))

In [34]: df
Out[34]: 
Empty DataFrame
Columns: []
Index: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [35]: df.shape
Out[35]: (10, 0)

In [36]: table = pa.table(df)

In [37]: table.to_pandas()
Out[37]: 
Empty DataFrame
Columns: []
Index: []

In [38]: table.to_pandas().shape
Out[38]: (0, 0)
{code}

Since the pandas metadata in the Table actually have this RangeIndex 
information:

{code}
In [39]: table.schema.pandas_metadata
Out[39]: 
{'index_columns': [{'kind': 'range',
   'name': None,
   'start': 0,
   'stop': 10,
   'step': 1}],
 'column_indexes': [{'name': None,
   'field_name': None,
   'pandas_type': 'empty',
   'numpy_type': 'object',
   'metadata': None}],
 'columns': [],
 'creator': {'library': 'pyarrow', 'version': '3.0.0.dev162+g305160495'},
 'pandas_version': '1.2.0.dev0+1225.g91f5bfcdc4'}
{code}

we should in principle be able to correctly roundtrip this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to