[I] adding iterrow [arrow]

via GitHub Wed, 06 Dec 2023 02:45:52 -0800


thunderbug1 opened a new issue, #39099:
URL: https://github.com/apache/arrow/issues/39099


   ### Describe the enhancement requested
   
   I am surprised that there is no iterrow feature in pyarrow, is there a 
special reason why?
   I do think that the datastructures are probably not optimized for this use, 
but it is such a basic feature that I think it could be added anyway.
   
   Here is what I use at the moment, certainly not optimized, but it does the 
job:
   
   ```import pyarrow as pa
   from typing import Iterator, Dict, Any
   
   def iter_rows(table: pa.Table, chunk_size: int = 10000) -> 
Iterator[Dict[str, Any]]:
       """
       Generator function to iterate over rows of a PyArrow table in chunks.
   
       :param table: PyArrow table.
       :param chunk_size: The number of rows per chunk for processing.
       :yield: Dictionary representing each row.
       """
       for chunk in table.to_batches(chunk_size):
           columns_dict = chunk.to_pydict()
           for i in range(chunk.num_rows):
               yield {col: columns_dict[col][i] for col in columns_dict}
   
   # Example usage
   data = pa.Table.from_pydict({
       'column1': [1, 2, 3],
       'column2': ['a', 'b', 'c']
   })
   
   for row in iter_rows(data):
       print(row)
   
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] adding iterrow [arrow]

Reply via email to