[
https://issues.apache.org/jira/browse/ARROW-5353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rok Mihevc updated ARROW-5353:
------------------------------
External issue URL: https://github.com/apache/arrow/issues/21812
> 0-row table can be written but not read
> ---------------------------------------
>
> Key: ARROW-5353
> URL: https://issues.apache.org/jira/browse/ARROW-5353
> Project: Apache Arrow
> Issue Type: Bug
> Components: C++, Python
> Affects Versions: 0.11.0, 0.12.0, 0.13.0
> Reporter: Thomas Buhrmann
> Priority: Major
>
> I can serialize a table with 0 rows, but not read it. The following code
> {code}
> import pandas as pd
> import pyarrow as pa
> df = pd.DataFrame({'x': [0,1,2]})[:0]
> fnm = "tbl.arr"
> tbl = pa.Table.from_pandas(df)
> print(tbl.schema)
> writer = pa.RecordBatchStreamWriter(fnm, tbl.schema)
> writer.write_table(tbl)
> reader = pa.RecordBatchStreamReader(fnm)
> tbl2 = reader.read_all()
> {code}
> ...results in the following output:
> {code}
> x: int64
> metadata
> --------
> OrderedDict([(b'pandas',
> b'{"index_columns": [{"kind": "range", "name": null, "start": '
> b'0, "stop": 0, "step": 1}], "column_indexes": [{"name": null,'
> b' "field_name": null, "pandas_type": "unicode", "numpy_type":'
> b' "object", "metadata": {"encoding": "UTF-8"}}], "columns": ['
> b'{"name": "x", "field_name": "x", "pandas_type": "int64", "nu'
> b'mpy_type": "int64", "metadata": null}], "creator": {"library'
> b'": "pyarrow", "version": "0.13.0"}, "pandas_version":
> null}')])
> ---------------------------------------------------------------------------
> ArrowInvalid Traceback (most recent call last)
> <ipython-input-3-8869ad9b37c6> in <module>
> 11 writer.write_table(tbl)
> 12
> ---> 13 reader = pa.RecordBatchStreamReader(fnm)
> 14 tbl2 = reader.read_all()
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.py in
> __init__(self, source)
> 56 """
> 57 def __init__(self, source):
> ---> 58 self._open(source)
> 59
> 60
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/ipc.pxi in
> pyarrow.lib._RecordBatchStreamReader._open()
> ~/anaconda/envs/grapy/lib/python3.6/site-packages/pyarrow/error.pxi in
> pyarrow.lib.check_status()
> ArrowInvalid: Expected schema message in stream, was null or length 0
> {code}
> Since the schema should be sufficient to build a table, even though it may
> not have any actual data, I wouldn't expect this to fail but return the same
> 0-row input table.
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)