Re: [I] Max batch size for Dataset [arrow]

via GitHub Wed, 20 Mar 2024 17:36:49 -0700


amoeba commented on issue #40576:
URL: https://github.com/apache/arrow/issues/40576#issuecomment-2010974700


   Can you share some code to show how you're approaching this? The ultimate 
thing you're trying to do is possible and part of the design of the Datasets 
module, see 
https://arrow.apache.org/docs/python/dataset.html#writing-large-amounts-of-data.
 Also be aware of the considerations in 
https://arrow.apache.org/docs/python/dataset.html#partitioning-performance-considerations.
   
   What you most likely don't want to be doing here is converting one or more 
Parquet files into an Arrow Table, then converting that Arrow Table to one or 
more Parquet files.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Max batch size for Dataset [arrow]

Reply via email to