westonpace commented on issue #34363:
URL: https://github.com/apache/arrow/issues/34363#issuecomment-1548348207

   This seems like a legitimate request and pretty workable.  We are pretty 
close already.  The code in ObjectOutputStream is roughly...
   
   ```
   if request > part_limit:
     submit_request(request)
     return
   buffer.append(request)
   if buffer > part_limit:
     submit_request(buffer)
     buffer.reset()
   ```
   
   Given we are already talking about cloud upload and I/O I think we can just 
directly implement the equal parts approach (instead of trying to maintain 
both) without too much hit to performance (though there will be some hit since 
this introduces a mandatory extra copy of the data in some cases).  This would 
change the above logic to:
   
   ```
   buffer.append(request)
   for chunk in slice_off_whole_chunks(buffer, part_limit):
    submit_request(chunk)
   ```
   
   Does anyone want to create a PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to