zeroshade commented on issue #1997:
URL: https://github.com/apache/arrow-adbc/issues/1997#issuecomment-2220927999

   With the way the code is currently written there's the following scenario I 
can think of:
   
   1. The default writer concurrency is `runtime.NumCPU()`, that could explain 
the `number of processor` files being produced.
   2. The writer limit for the size is defined in the `bulk_ingest.go` by 
writing an entire record batch and then checking the size of the data after 
writing it. The only check on the writer side is for a max number of rows which 
isn't set by default. This means that if the stream of record batches is 
producing large enough records, the whole record batch will be written first to 
the file before we do the check for the file size. @Zan-L can you confirm if 
your record batch reader is producing significantly large batches? Would it be 
possible for the provided record batch reader to slice the batches before 
passing them on, making them smaller and allowing the files to be more easily 
limited?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to