[
https://issues.apache.org/jira/browse/ARROW-1291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16105491#comment-16105491
]
Li Jin commented on ARROW-1291:
-------------------------------
The use case I have is that I am passing a user provided pandas dataframe to
Spark using Arrow. In my particular case, I don't care about the name of the
column in the pandas DataFrame because the column names are defined in the
Spark's schema, so it's weird to ask for people to write out their column names
in pandas and just to throw them away later...
I think it's more friendly behavior that to cast numeric columns to string than
to throw this exception. My use case is a bit special that I don't care about
the column names, so I could do the casting in my code. But I think other user
might also find the current behavior surprising.
I agree it's probably not worth it for arrow to preserve the numeric column
names.
> [Python] pa.RecordBatch.from_pandas doesn't accept DataFrame with numeric
> column names
> --------------------------------------------------------------------------------------
>
> Key: ARROW-1291
> URL: https://issues.apache.org/jira/browse/ARROW-1291
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.5.0
> Reporter: Li Jin
> Priority: Minor
>
> {code}
> import pyarrow as pa
> import pandas as pd
> df = pd.DataFrame([1])
> pa.RecordBatch.from_pandas(df)
> {code}
> Exception:
> {code}
> TypeError Traceback (most recent call last)
> <ipython-input-5-670ba4a2ddb2> in <module>()
> 3
> 4 df = pd.DataFrame([1])
> ----> 5 pa.RecordBatch.from_pandas(df)
> table.pxi in pyarrow.lib.RecordBatch.from_pandas()
> table.pxi in pyarrow.lib._dataframe_to_arrays()
> /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py
> in construct_metadata(df, index_levels, preserve_index, types)
> 187 arrow_type=arrow_type
> 188 )
> --> 189 for name, arrow_type in zip(df.columns, df_types)
> 190 ] + (
> 191 [
> /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py
> in <listcomp>(.0)
> 187 arrow_type=arrow_type
> 188 )
> --> 189 for name, arrow_type in zip(df.columns, df_types)
> 190 ] + (
> 191 [
> /home/icexelloss/miniconda3/envs/spark-dev/lib/python3.5/site-packages/pyarrow/pandas_compat.py
> in get_column_metadata(column, name, arrow_type)
> 125 raise TypeError(
> 126 'Column name must be a string. Got column {} of type
> {}'.format(
> --> 127 name, type(name).__name__
> 128 )
> 129 )
> TypeError: Column name must be a string. Got column 0 of type int64
> {code}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)