Thomas Buhrmann created ARROW-8498: -------------------------------------- Summary: Schema.from_pandas fails on extension type, while Table.from_pandas works Key: ARROW-8498 URL: https://issues.apache.org/jira/browse/ARROW-8498 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.16.0 Reporter: Thomas Buhrmann
While Table.from_pandas() seems to work as expected with extension types, Schema.from_pandas() raises an ArrowTypeError: {code:python} df = pd.DataFrame({ "x": pd.Series([1, 2, None], dtype="Int8"), "y": pd.Series(["a", "b", None], dtype="category"), "z": pd.Series(["ab", "bc", None], dtype="string"), }) print(pa.Table.from_pandas(df).schema) print(pa.Schema.from_pandas(df)) {code} Results in: {noformat} x: int8 y: dictionary<values=string, indices=int8, ordered=0> z: string metadata -------- {b'pandas': b'{"index_columns": [{"kind": "range", "name": null, "start": 0, "' b'stop": 3, "step": 1}], "column_indexes": [{"name": null, "field_' b'name": null, "pandas_type": "unicode", "numpy_type": "object", "' b'metadata": {"encoding": "UTF-8"}}], "columns": [{"name": "x", "f' b'ield_name": "x", "pandas_type": "int8", "numpy_type": "Int8", "m' b'etadata": null}, {"name": "y", "field_name": "y", "pandas_type":' b' "categorical", "numpy_type": "int8", "metadata": {"num_categori' b'es": 2, "ordered": false}}, {"name": "z", "field_name": "z", "pa' b'ndas_type": "unicode", "numpy_type": "string", "metadata": null}' b'], "creator": {"library": "pyarrow", "version": "0.16.0"}, "pand' b'as_version": "1.0.3"}'} --------------------------------------------------------------------------- ArrowTypeError Traceback (most recent call last) ... ArrowTypeError: Did not pass numpy.dtype object {noformat} I'd imagine Table.from_pandas(df).schema and Schema.from_pandas(df) should result in the exact same object? -- This message was sent by Atlassian Jira (v8.3.4#803005)