sugibuchi opened a new issue, #38012: URL: https://github.com/apache/arrow/issues/38012
### Describe the bug, including details regarding any error messages, version, and platform. The documentation of `pyarrow.dataset.dataset` says this function accepts `RecordBatchReader` as `source`. > **(List of) batches or tables, iterable of batches, or RecordBatchReader:** > Create an InMemoryDataset. If an iterable or empty list is given, a schema must also be given. If an iterable or RecordBatchReader is given, the resulting dataset can only be scanned once; further attempts will raise an error. > https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html#pyarrow-dataset-dataset However, `pyarrow.dataset.dataset` throws `TypeError` when we call this function with `RecordBatchReader` as `source`. ### Environment * OS: Ubuntu 22.04 * Python: Python 3.9.18 * PyArrow: 13.0.0 ### POC ```python import pyarrow as pa import pyarrow.dataset as ds table = pa.Table.from_pydict({ "col_1": list(range(0, 10000)), "col_2": [f"v{v}" for v in range(0, 10000)] }) batches = t.to_batches(max_chunksize=100) ds.dataset(batches) # -> <pyarrow._dataset.InMemoryDataset at ...> batch_reader = pa.RecordBatchReader.from_batches(table.schema, batches) ds.dataset(batch_reader) # -> Fail! ``` The last line fails with the following error. ``` File /opt/conda/lib/python3.9/site-packages/pyarrow/dataset.py:793, in dataset(source, schema, format, filesystem, partitioning, partition_base_dir, exclude_invalid_files, ignore_prefixes) 791 return _in_memory_dataset(source, **kwargs) 792 else: --> 793 raise TypeError( 794 'Expected a path-like, list of path-likes or a list of Datasets ' 795 'instead of the given type: {}'.format(type(source).__name__) 796 ) TypeError: Expected a path-like, list of path-likes or a list of Datasets instead of the given type: RecordBatchReader ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
