sugibuchi opened a new issue, #38012:
URL: https://github.com/apache/arrow/issues/38012

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   The documentation of `pyarrow.dataset.dataset` says this function accepts 
`RecordBatchReader` as `source`.
   
   > **(List of) batches or tables, iterable of batches, or RecordBatchReader:**
   > Create an InMemoryDataset. If an iterable or empty list is given, a schema 
must also be given. If an iterable or RecordBatchReader is given, the resulting 
dataset can only be scanned once; further attempts will raise an error.
   > 
https://arrow.apache.org/docs/python/generated/pyarrow.dataset.dataset.html#pyarrow-dataset-dataset
   
   However, `pyarrow.dataset.dataset` throws `TypeError` when we call this 
function with `RecordBatchReader` as `source`.
   
   ### Environment
   * OS: Ubuntu 22.04
   * Python: Python 3.9.18
   * PyArrow: 13.0.0
   
   ### POC
   
   ```python
   import pyarrow as pa
   import pyarrow.dataset as ds
   
   table = pa.Table.from_pydict({
       "col_1": list(range(0, 10000)),
       "col_2": [f"v{v}" for v in range(0, 10000)]
   })
   
   batches = t.to_batches(max_chunksize=100)
   
   ds.dataset(batches) # -> <pyarrow._dataset.InMemoryDataset at ...>
   
   batch_reader = pa.RecordBatchReader.from_batches(table.schema, batches)
   ds.dataset(batch_reader) # -> Fail!
   ```
   
   The last line fails with the following error.
   
   ```
   File /opt/conda/lib/python3.9/site-packages/pyarrow/dataset.py:793, in 
dataset(source, schema, format, filesystem, partitioning, partition_base_dir, 
exclude_invalid_files, ignore_prefixes)
       791     return _in_memory_dataset(source, **kwargs)
       792 else:
   --> 793     raise TypeError(
       794         'Expected a path-like, list of path-likes or a list of 
Datasets '
       795         'instead of the given type: {}'.format(type(source).__name__)
       796     )
   
   TypeError: Expected a path-like, list of path-likes or a list of Datasets 
instead of the given type: RecordBatchReader
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to