Tushar7012 commented on PR #20023:
URL: https://github.com/apache/datafusion/pull/20023#issuecomment-3817995177

   Hi @2010YOUY01 ,
   Thanks for the detailed feedback — I appreciate you taking the time to call 
this out.
   
   You’re right that I didn’t clearly articulate the motivating workload or 
demonstrate why this change improves a concrete query path. That’s on me. The 
intent behind this PR was to reduce cold-start latency for listing tables with 
many paths, but I agree that “this looks slow” is not sufficient justification 
for a change of this size without data.
   
   As a next step, I’ll do the following before asking for further review:
   - Identify and document a concrete workload (cold-start query on a listing 
table with multiple paths).
   - Measure baseline vs PR behavior, focusing specifically on planning / file 
listing time.
   - Explain where file listing sits on the critical path for that query and 
why bounded parallelism helps in this case.
   - Update the PR description to clearly capture the motivation, measurements, 
and internal reasoning.
   
   If the measurements don’t show a meaningful improvement, I’m happy to 
reconsider or narrow the scope of this change. My goal here is to improve 
DataFusion’s performance in a way that’s well-motivated and easy to reason 
about, not to push an optimization without sufficient understanding.
   
   Thanks again for the guidance — I’ll follow up once I have data and a 
clearer explanation to share.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to