zhztheplayer commented on PR #6009: URL: https://github.com/apache/incubator-gluten/pull/6009#issuecomment-2159755195
> Just comment this: `minBatchSize` is either not accurate. The accurate description of the batch size is "Velox will try best to limit the row numbers per rowVector to the maxBatchSize config, however, it's not guaranteed. @zhztheplayer can you update the document to highlight this? The tricky part is that I don't see we always follow either `min` or `max` criteria when using the option. (correct me if I am wrong) It's used as `min` in shuffle reader, but may be used as `max` in shuffle writer. I don't go through scan's code so not sure about that part. Maybe we can change the option name to `targetBatchSize` to clarify. Actually it's not that important whether a batch is slightly larger than or smaller than this size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
