[GitHub] [arrow] zeroshade commented on issue #36095: Make `WriteBuffered` a default

via GitHub Thu, 15 Jun 2023 08:42:42 -0700


zeroshade commented on issue #36095:
URL: https://github.com/apache/arrow/issues/36095#issuecomment-1593315358


   The difference makes sense to me, as you pointed out, it's likely due to the 
overhead of having many separate small row-groups rather than one big row 
group. One suggestion I'd make is to gather the records to create an 
`arrow.Table` and try using `WriteTable` and see what the performance is like 
(with some large chunk size) as it will use buffered writing internally to do 
that.
   
   My personal opinion is still that I wouldn't necessarily want to change the 
default behavior out from under people. We could, however, improve the 
documentation and comments on the respective functions though.
   
   Alternately, another solution might be to have a Write method which can 
choose to use buffered/non-buffered based on whether a consumer calls 
`NewRowGroup` or `NewBufferedRowGroup` themselves etc. and do the writes up 
until the max row group length and return how many rows were written / how many 
rows were left over / some other indication. And give consumers more granular 
control over the writing in that respect.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] zeroshade commented on issue #36095: Make `WriteBuffered` a default

Reply via email to