2010YOUY01 opened a new issue, #18456:
URL: https://github.com/apache/datafusion/issues/18456

   ### Is your feature request related to a problem or challenge?
   
   Originally suggested by @simonvandel in 
https://github.com/apache/datafusion/issues/18217#issuecomment-3472801129
   
   > ProjectionExec currently shows the total elapsed time for all expressions. 
Would it make sense to show timings for each individual expression? Perhaps we 
could do it in VERBOSE mode
   
   Given query (executed in `datafusion-cli`:
   ```
   > set datafusion.explain.analyze_level = summary;
   0 row(s) fetched.
   Elapsed 0.001 seconds.
   
   > explain analyze select pow(v1,2), abs(round(sqrt(v1)))
   from generate_series(1000) as t1(v1);
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                                                       |
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | ProjectionExec: expr=[power(v1@0, 2) as 
pow(t1.v1,Int64(2)), abs(round(sqrt(CAST(v1@0 AS Float64)))) as 
abs(round(sqrt(t1.v1)))], metrics=[output_rows=1001, elapsed_compute=167.513µs, 
output_bytes=15.8 KB] |
   |                   |   RepartitionExec: partitioning=RoundRobinBatch(14), 
input_partitions=1, metrics=[spill_count=0, spilled_bytes=0.0 B, 
spilled_rows=0, fetch_time=48.126µs, repartition_time=1ns, send_time=6.055µs]   
        |
   |                   |     ProjectionExec: expr=[value@0 as v1], 
metrics=[output_rows=1001, elapsed_compute=2.458µs, output_bytes=64.0 KB]       
                                                                                
    |
   |                   |       LazyMemoryExec: partitions=1, 
batch_generators=[generate_series: start=0, end=1000, batch_size=8192], 
metrics=[output_rows=1001, elapsed_compute=35.208µs, output_bytes=64.0 KB]      
                  |
   |                   |                                                        
                                                                                
                                                                       |
   
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   1 row(s) fetched.
   Elapsed 0.006 seconds.
   ```
   In `ProjectExec`'s metric, we can track each expressions evaluation time 
separately like `pow(v1,2): 10ms, abs(round(sqrt(v1))): 30ms`, this should be 
easy to implement.
   
   Additionally, we can also implement fine-grained execution time tracking for 
nested expression like `sqrt(v1): 5ms, round(sqrt(v1)): 10ms, 
abs(round(sqrt(v1))): 15ms`.
   I have briefly checked the implementation, it requires a) public API change 
that enables metrics tracking for `Expr`s b) Update implementation for 
expressions.
   I think we don't have to implement that for all builtin `Expr`s, the math 
expressions as showed in the example are usually very fast, however there are 
some expressions that are slower to evaluate, and regularly cause performance 
issue (e.g. `array_` functions, regex functions), it can be helpful to track 
execution time for them.
   
   ### Describe the solution you'd like
   
   - [ ] Implement per-expression metrics tracking in `ProjectionExec`
   - [ ] Support fine-grained execution time metric for individual `Expr`s in a 
nested expression.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to