jorgecarleitao commented on pull request #7729:
URL: https://github.com/apache/arrow/pull/7729#issuecomment-657614516


   One thing that is not clear to me yet is the idiom to handle RecordBatch and 
partitions. My understanding is that a Partition can be executed in parallel 
(thread), but a RecordBatch is generally executed on the same thread, i.e. we 
normally loop through each RecordBatch using the same thread.
   
   Is the goal of RecordBatch to split a partition in smaller chunks of data to 
avoid too much memory usage?
   
   In this PR, I have not merged all the RecordBatches within a given partition 
in a single batch, and instead kept them separate. I am not sure if this is the 
correct approach here.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to