alamb commented on issue #21733: URL: https://github.com/apache/datafusion/issues/21733#issuecomment-4281359699
I think this idea of "heuristically choose the order of files to scan to try and maximize dynamic filter efficiency" is a really neat one. So where I am heading is that it would be anice to have some sort of generic API like "reorder_files_heuristically" in the FileStream / shared work queue, rather than hard code in the sortedness heuristic. I realize the topk / sorting is probably the most imporatant one, but there may be others. Also I think my setting up a reasonable API will help keep the code structure easier to understand THank you for working on this @zhuqi-lucas -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
