comphead commented on PR #21351: URL: https://github.com/apache/datafusion/pull/21351#issuecomment-4248227239
Awesome, so the PR changes who reads which file at runtime using morselizer, would be extremely interesting to try this on many small files environments. Do we expect improvements for even partitions(partition have the similar number of files with similar sizes)? Is it planned to morselize deeper to process row groups in parallel? This activity actually reminds me of https://github.com/apache/datafusion/issues/19815 benchmark. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
