rluvaton commented on PR #17147: URL: https://github.com/apache/datafusion/pull/17147#issuecomment-3195981328
> FYI @ding-young and @2010YOUY01 > > @rluvaton could you also help review this PR? @alamb sure ------ @ding-young > This is a beneficial change. Having detailed sort metrics is valuable, and I think the direction of eventually extending this to cover sort-preserving merge and multi-level merge makes a lot of sense. I think so too. > One small concern I have is that, unlike aggregation where metrics are recorded for a single large batch, sorting often involves many small batches, each of which would now report timing to `LexSortMetrics`. Aggregation batches and sort batches should be the same size ideally, what do you mean single large batch? > Since much of the work in the current sort implementation happens in the merge phase rather than `sort_batch`, I’m a bit concerned that the relative cost of metric reporting within `sort_batch` might become non-trivial in practice. I did not know that actually and these metrics would have shown me. also, it is depend on many things in order for it to be the case: 1. number of sort columns 2. sort columns type 3. merge degree ----- > I think the extended metrics in this PR are too fine-grained, since we are rarely checking them, and also it's possible to measure those metrics through the flamegraph ([datafusion.apache.org/library-user-guide/profiling.html](https://datafusion.apache.org/library-user-guide/profiling.html)), it might not worth to implement them as metrics. @2010YOUY01 metrics are made for production, flamegraph is where you actually can run it to reproduce. having this in metrics would actually benefit visibility. the only problem I have is the naming of the metrics, take to indices and lexsort are implementation detail, maybe we can update the naming to be better understood by people who are not aware of the internals like calculating sort indices for a single batch and sorting which is the `take` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org