rluvaton commented on PR #17147:
URL: https://github.com/apache/datafusion/pull/17147#issuecomment-3195981328

   > FYI @ding-young and @2010YOUY01
   > 
   > @rluvaton could you also help review this PR?
   
   @alamb sure
   
   ------
   
   @ding-young 
   
   > This is a beneficial change. Having detailed sort metrics is valuable, and 
I think the direction of eventually extending this to cover sort-preserving 
merge and multi-level merge makes a lot of sense.
   
   I think so too.
   
   > One small concern I have is that, unlike aggregation where metrics are 
recorded for a single large batch, sorting often involves many small batches, 
each of which would now report timing to `LexSortMetrics`.
   
   Aggregation batches and sort batches should be the same size ideally, what 
do you mean single large batch?
   
   > Since much of the work in the current sort implementation happens in the 
merge phase rather than `sort_batch`, I’m a bit concerned that the relative 
cost of metric reporting within `sort_batch` might become non-trivial in 
practice.
   
   I did not know that actually and these metrics would have shown me.
   
   also, it is depend on many things in order for it to be the case:
   1. number of sort columns
   2. sort columns type
   3. merge degree
   
   -----
   
   > I think the extended metrics in this PR are too fine-grained, since we are 
rarely checking them, and also it's possible to measure those metrics through 
the flamegraph 
([datafusion.apache.org/library-user-guide/profiling.html](https://datafusion.apache.org/library-user-guide/profiling.html)),
 it might not worth to implement them as metrics.
   
   @2010YOUY01 metrics are made for production, flamegraph is where you 
actually can run it to reproduce. having this in metrics would actually benefit 
visibility.
   
   the only problem I have is the naming of the metrics, take to indices and 
lexsort are implementation detail, maybe we can update the naming to be better 
understood by people who are not aware of the internals like calculating sort 
indices for a single batch and sorting which is the `take` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to