[ https://issues.apache.org/jira/browse/ARROW-9147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated ARROW-9147: ---------------------------------- Labels: dataset dataset-dask-integration pull-request-available (was: dataset dataset-dask-integration) > [C++][Dataset] Support null -> other type promotion in Dataset scanning > ----------------------------------------------------------------------- > > Key: ARROW-9147 > URL: https://issues.apache.org/jira/browse/ARROW-9147 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Reporter: Joris Van den Bossche > Assignee: Ben Kietzman > Priority: Major > Labels: dataset, dataset-dask-integration, pull-request-available > Fix For: 2.0.0 > > Time Spent: 10m > Remaining Estimate: 0h > > With regarding schema evolution / normalization, we support inserting nulls > for a missing column or changing nullability, or normalizing column order, > but we do not yet seem to support promotion of null type to any other type. > Small python example: > {code} > In [11]: df = pd.DataFrame({"col": np.array([None, None, None, None], > dtype='object')}) > ...: df.to_parquet("test_filter_schema.parquet", engine="pyarrow") > ...: > ...: import pyarrow.dataset as ds > ...: dataset = ds.dataset("test_filter_schema.parquet", format="parquet", > schema=pa.schema([("col", pa.int64())])) > ...: dataset.to_table() > ... > ~/scipy/repos/arrow/python/pyarrow/_dataset.pyx in > pyarrow._dataset.Dataset.to_table() > ~/scipy/repos/arrow/python/pyarrow/_dataset.pyx in > pyarrow._dataset.Scanner.to_table() > ~/scipy/repos/arrow/python/pyarrow/error.pxi in > pyarrow.lib.pyarrow_internal_check_status() > ~/scipy/repos/arrow/python/pyarrow/error.pxi in pyarrow.lib.check_status() > ArrowTypeError: fields had matching names but differing types. From: col: > null To: col: int64 > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)