danielz02 commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-2123686924

   > Would `iter_batches()` as a workaround ok?
   
   @mapleFU I don't think so. I was using `IterableDataset` from Hugging Face, 
which calls `iter_batches()` and got a similar error.
   
   ```bash
     File 
"/home/xxxx/miniforge3/envs/geospatial/lib/python3.11/site-packages/datasets/iterable_dataset.py",
 line 1385, in __iter__
       for key, pa_table in iterator:
     File 
"/home/xxxx/miniforge3/envs/geospatial/lib/python3.11/site-packages/datasets/iterable_dataset.py",
 line 167, in _batch_arrow_tables
       for key, pa_table in iterable:
     File 
"/home/xxxx/miniforge3/envs/geospatial/lib/python3.11/site-packages/datasets/iterable_dataset.py",
 line 289, in _iter_arrow
       yield from self.generate_tables_fn(**self.kwargs)
     File 
"/home/xxxx/miniforge3/envs/geospatial/lib/python3.11/site-packages/datasets/packaged_modules/parquet/parquet.py",
 line 90, in _generate_tables
       for batch_idx, record_batch in enumerate(
     File "pyarrow/_parquet.pyx", line 1587, in iter_batches
     File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
   OSError: List index overflow.
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to