zeroshade commented on issue #36095: URL: https://github.com/apache/arrow/issues/36095#issuecomment-1593315358
The difference makes sense to me, as you pointed out, it's likely due to the overhead of having many separate small row-groups rather than one big row group. One suggestion I'd make is to gather the records to create an `arrow.Table` and try using `WriteTable` and see what the performance is like (with some large chunk size) as it will use buffered writing internally to do that. My personal opinion is still that I wouldn't necessarily want to change the default behavior out from under people. We could, however, improve the documentation and comments on the respective functions though. Alternately, another solution might be to have a Write method which can choose to use buffered/non-buffered based on whether a consumer calls `NewRowGroup` or `NewBufferedRowGroup` themselves etc. and do the writes up until the max row group length and return how many rows were written / how many rows were left over / some other indication. And give consumers more granular control over the writing in that respect. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
