[
https://issues.apache.org/jira/browse/ARROW-1895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16283471#comment-16283471
]
ASF GitHub Bot commented on ARROW-1895:
---------------------------------------
jorisvandenbossche commented on a change in pull request #1397:
ARROW-1895/ARROW-1897: [Python] Add field_name to pandas index metadata
URL: https://github.com/apache/arrow/pull/1397#discussion_r155769103
##########
File path: python/pyarrow/tests/test_convert_pandas.py
##########
@@ -174,15 +206,32 @@ def test_categorical_column_index(self):
column_indexes, = js['column_indexes']
assert column_indexes['name'] is None
assert column_indexes['pandas_type'] == 'categorical'
- assert column_indexes['numpy_type'] == 'object'
+ assert column_indexes['numpy_type'] == 'int8'
md = column_indexes['metadata']
assert md['num_categories'] == 3
assert md['ordered'] is False
+ def test_string_column_index(self):
+ df = pd.DataFrame(
+ [(1, 'a', 2.0), (2, 'b', 3.0), (3, 'c', 4.0)],
+ columns=pd.Index(list('def'), name='stringz')
+ )
+ t = pa.Table.from_pandas(df, preserve_index=True)
+ raw_metadata = t.schema.metadata
+ js = json.loads(raw_metadata[b'pandas'].decode('utf8'))
+
+ column_indexes, = js['column_indexes']
+ assert column_indexes['name'] == 'stringz'
+ assert column_indexes['name'] == column_indexes['field_name']
+ assert column_indexes['pandas_type'] == ('bytes' if PY2 else 'unicode')
+ assert column_indexes['numpy_type'] == 'object'
+
+ md = column_indexes['metadata']
+ assert len(md) == 1
+ assert md['encoding'] == 'UTF-8'
Review comment:
Tests are failing on this one. Maybe this is only the case for unicode and
not for bytes, and thus only for `not PY2` ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [Python] Add field_name to pandas index metadata
> ------------------------------------------------
>
> Key: ARROW-1895
> URL: https://issues.apache.org/jira/browse/ARROW-1895
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.7.1
> Reporter: Phillip Cloud
> Assignee: Phillip Cloud
> Labels: pull-request-available
> Fix For: 0.8.0
>
>
> See the discussion here for details:
> https://github.com/pandas-dev/pandas/pull/18201
> In short we need a way to map index column names to field names in an arrow
> Table.
> Additionally, we're depending on the index columns being written at the end
> of the table and fixing this would allow us to read metadata written by other
> systems (e.g., fastparquet) that don't make this assumption.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)