Tushar7012 commented on PR #20023: URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3817995177
Hi @2010YOUY01 , Thanks for the detailed feedback — I appreciate you taking the time to call this out. You’re right that I didn’t clearly articulate the motivating workload or demonstrate why this change improves a concrete query path. That’s on me. The intent behind this PR was to reduce cold-start latency for listing tables with many paths, but I agree that “this looks slow” is not sufficient justification for a change of this size without data. As a next step, I’ll do the following before asking for further review: - Identify and document a concrete workload (cold-start query on a listing table with multiple paths). - Measure baseline vs PR behavior, focusing specifically on planning / file listing time. - Explain where file listing sits on the critical path for that query and why bounded parallelism helps in this case. - Update the PR description to clearly capture the motivation, measurements, and internal reasoning. If the measurements don’t show a meaningful improvement, I’m happy to reconsider or narrow the scope of this change. My goal here is to improve DataFusion’s performance in a way that’s well-motivated and easy to reason about, not to push an optimization without sufficient understanding. Thanks again for the guidance — I’ll follow up once I have data and a clearer explanation to share. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
