2010YOUY01 commented on PR #17147:
URL: https://github.com/apache/datafusion/pull/17147#issuecomment-3199797955

   > > I think the extended metrics in this PR are too fine-grained, since we 
are rarely checking them, and also it's possible to measure those metrics 
through the flamegraph 
(https://datafusion.apache.org/library-user-guide/profiling.html), it might not 
worth to implement them as metrics.
   > > However, for certain metrics that are not possible to obtain from 
flamegraphs (such as, within a single in-memory sort, the average number of 
batches being handled at a time; or the number of merge levels), it would be a 
good idea to include them in the metrics.
   > 
   > When used in distributed compute environments(such as when using 
DataFusion via Comet, which is where this arose), it can get very unwieldy to 
use flamegraph, and I also don't always have control over how the executable 
was launched. Using metrics was the best way for me to see what was taking my 
time in the SortExec^ But I can close this PR if this is not a point of 
interest^
   
   If it's possible to use those metrics to find a better Comet tuning, I think 
including them makes sense. I was imaging those metrics look like something DF 
internal developer would care about, that are checked to optimize `SortExec` 
implementation.
   Though I don't fully get how to use them to tune applications, I'd recommend 
to include some comments.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to