Tushar7012 commented on issue #19971: URL: https://github.com/apache/datafusion/issues/19971#issuecomment-3801309698
Thanks for the feedback @BlakeOrth. You raise a valid point about the [object_store](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) machinery being inherently sequential for listing operations. A few notes: I accidentally committed some [table.rs](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) changes to the wrong PR (#19975 - ArrowBytesViewMap optimization). I've now reverted those changes there. Before investing more time on this, I'd like to understand: Are there specific scenarios where parallelizing [list_files_for_scan](vscode-file://vscode-app/c:/Users/td334/AppData/Local/Programs/Microsoft%20VS%20Code/resources/app/out/vs/code/electron-browser/workbench/workbench.html) would provide measurable benefits? (e.g., multiple table paths, or when statistics collection is the bottleneck rather than listing itself) Would it be more valuable to focus on parallelizing the statistics collection phase (which uses buffer_unordered) rather than the file listing phase? I can run some benchmarks on cold query performance to gather actual evidence of improvement (or lack thereof). Would that help inform whether this work is worth pursuing? Happy to hold off on this until we have clearer direction, or pivot to focus on areas where parallelization would have more impact. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
