danking commented on issue #47279:
URL: https://github.com/apache/arrow/issues/47279#issuecomment-3214534792

   I have likewise encountered this issue while implementing a Ray Datasource 
for the [Vortex file format](GitHub.com/vortex-data/vortex). It's somewhat 
cheaper for Vortex to yield PyArrow Tables rather than Pandas DataFrames for 
fiddly reasons related to Pandas' Arrow support.
   
   ```
   _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
_ _ 
   pyarrow/table.pxi:6307: in pyarrow.lib.concat_tables
       ???
   test/test_datasource.py:38: in <genexpr>
       tbl = pa.concat_tables(pa.Table.from_pydict(x) for x in 
ds.iter_batches())  # pyright: ignore[reportUnknownMemberType, 
reportArgumentType]
                             
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ../.venv/lib/python3.11/site-packages/ray/data/iterator.py:185: in 
_create_iterator
       ) = self._to_ref_bundle_iterator()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   
../.venv/lib/python3.11/site-packages/ray/data/_internal/iterator/iterator_impl.py:27:
 in _to_ref_bundle_iterator
       ref_bundles_iterator, stats = self._base_dataset._execute_to_iterator()
                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   ../.venv/lib/python3.11/site-packages/ray/data/dataset.py:6103: in 
_execute_to_iterator
       bundle_iter, stats, executor = self._plan.execute_to_iterator()
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   E               ray.exceptions.RayTaskError(ArrowTypeError): 
ray::ReadVortex->SplitBlocks(100)() (pid=90297, ip=127.0.0.1)
   E                   for b_out in 
map_transformer.apply_transform(iter(blocks), ctx):
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py",
 line 601, in __call__
   E                   for block in blocks:
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/execution/operators/map_transformer.py",
 line 534, in __call__
   E                   while output_buffer.has_next():
   E                         ^^^^^^^^^^^^^^^^^^^^^^^^
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/output_buffer.py",
 line 95, in has_next
   E                   self._exceeded_buffer_row_limit() or 
self._exceeded_buffer_size_limit()
   E                                                        
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/output_buffer.py",
 line 85, in _exceeded_buffer_size_limit
   E                   and self._buffer.get_estimated_memory_usage()
   E                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/delegating_block_builder.py",
 line 76, in get_estimated_memory_usage
   E                   return self._builder.get_estimated_memory_usage()
   E                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/table_block.py",
 line 155, in get_estimated_memory_usage
   E                   self._tables_size_bytes += 
BlockAccessor.for_block(table).size_bytes()
   E                                              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   E                 File 
"/Users/danielking/projects/vortex/.venv/lib/python3.11/site-packages/ray/data/_internal/arrow_block.py",
 line 320, in size_bytes
   E                   return self._table.nbytes
   E                          ^^^^^^^^^^^^^^^^^^
   E                 File "pyarrow/table.pxi", line 5303, in 
pyarrow.lib.Table.nbytes.__get__
   E                 File "pyarrow/error.pxi", line 155, in 
pyarrow.lib.pyarrow_internal_check_status
   E                 File "pyarrow/error.pxi", line 92, in 
pyarrow.lib.check_status
   E               pyarrow.lib.ArrowTypeError: Extracting byte ranges not 
supported for type string_view
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to