[GitHub] [arrow] westonpace commented on issue #13827: How to read a subset of columns from arrow ipc format file the fastest way?

GitBox Tue, 09 Aug 2022 13:34:54 -0700


westonpace commented on issue #13827:
URL: https://github.com/apache/arrow/issues/13827#issuecomment-1209847707


   @drin is correct, these functions are not exposed to pyarrow at the moment.  
However, from pyarrow, if you use the datasets API to read those files, it 
should achieve the desired effect:
   
   ```
   import pyarrow.dataset as ds
   my_dataset = ds.dataset(['/tmp/my_ipc.arrow'], format='arrow')
   my_table = my_dataset.to_table()
   ```
   
   Another benefit is if you can save your file in multiple row groups.  This 
will allow you to start doing processing before you load the entire file into 
memory.  I'm not sure if this is workable for you or not:
   
   ```
   import pyarrow.dataset as ds
   my_dataset = ds.dataset(['/tmp/my_ipc.arrow'], format='arrow')
   for next_batch in my_dataset.to_batches():
     # Do something with batch
     print(batch)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] westonpace commented on issue #13827: How to read a subset of columns from arrow ipc format file the fastest way?

Reply via email to