[
https://issues.apache.org/jira/browse/ARROW-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281016#comment-16281016
]
ASF GitHub Bot commented on ARROW-1895:
---------------------------------------
jorisvandenbossche commented on issue #1397: ARROW-1895: [Python] Add
field_name to pandas index metadata
URL: https://github.com/apache/arrow/pull/1397#issuecomment-349792851
One special case that I encountered in
https://github.com/apache/arrow/pull/1386 is a DataFrame with column name
`None` (from ipc when serializing a Series without name).
This case is not yet handled here:
```
In [6]: pa.Table.from_pandas(pd.DataFrame({None: [1,2,3]}))
Out[6]:
pyarrow.Table
None: int64
__index_level_0__: int64
metadata
--------
{b'pandas': b'{"index_columns": ["__index_level_0__"], "column_indexes":
[{"na'
b'me": null, "pandas_type": "mixed", "numpy_type": "object",
"meta'
b'data": null}], "columns": [{"name": null, "field_name": null,
"p'
b'andas_type": "int64", "numpy_type": "int64", "metadata":
null}, '
b'{"name": null, "field_name": "__index_level_0__",
"pandas_type":'
b' "int64", "numpy_type": "int64", "metadata": null}],
"pandas_ver'
b'sion": "0.22.0.dev0+260.g5da3759"}'}
```
So for the column, `"name": null, "field_name": null,` are both null, while
field_name should be "None"
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Add field_name to pandas index metadata
> ------------------------------------------------
>
> Key: ARROW-1895
> URL: https://issues.apache.org/jira/browse/ARROW-1895
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Phillip Cloud
> Assignee: Phillip Cloud
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See the discussion here for details:
> https://github.com/pandas-dev/pandas/pull/18201
> In short we need a way to map index column names to field names in an arrow
> Table.
> Additionally, we're depending on the index columns being written at the end
> of the table and fixing this would allow us to read metadata written by other
> systems (e.g., fastparquet) that don't make this assumption.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)