FelixYBW commented on PR #5951: URL: https://github.com/apache/incubator-gluten/pull/5951#issuecomment-2144449307
Thank you for the improvement. The ideal case of current split function is that: the input batch size should be as large as possible but all columns can fit into L2 cache. Once the column data can't fit into L2 cache, split performance will drop dramatically. On the other side if the batch size is too small, the overhead of preparing of split actually can't be ignored. We should have room to improve with this case. Also in current implementation, we must flatten the rowvector to split. The flatten itself also have overhead which is not analyzed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
