Thomas Buhrmann created ARROW-8498:
--------------------------------------

             Summary: Schema.from_pandas fails on extension type, while 
Table.from_pandas works
                 Key: ARROW-8498
                 URL: https://issues.apache.org/jira/browse/ARROW-8498
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.16.0
            Reporter: Thomas Buhrmann


While Table.from_pandas() seems to work as expected with extension types,
 Schema.from_pandas()  raises an ArrowTypeError:

{code:python}
df = pd.DataFrame({
   "x": pd.Series([1, 2, None], dtype="Int8"),
   "y": pd.Series(["a", "b", None], dtype="category"),
   "z": pd.Series(["ab", "bc", None], dtype="string"),
})
print(pa.Table.from_pandas(df).schema)
print(pa.Schema.from_pandas(df))
{code}
 
Results in:

{noformat}
x: int8
y: dictionary<values=string, indices=int8, ordered=0>
z: string
metadata
--------
{b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "'
            b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_'
            b'name": null, "pandas_type": "unicode", "numpy_type": "object", "'
            b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f'
            b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m'
            b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":'
            b' "categorical", "numpy_type": "int8", "metadata": {"num_categori'
            b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa'
            b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}'
            b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand'
            b'as_version": "1.0.3"}'}

---------------------------------------------------------------------------
ArrowTypeError                            Traceback (most recent call last)
...
ArrowTypeError: Did not pass numpy.dtype object
{noformat}

I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should 
result in the exact same object?




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to