xsa-dev opened a new pull request, #18573:
URL: https://github.com/apache/datafusion/pull/18573

   ## Description:
   This update introduces a new configuration option, 
`individual_expr_metrics`, allowing ProjectionExec to track execution time for 
each expression separately. When enabled, detailed profiling metrics will be 
generated for each expression, enhancing performance analysis in EXPLAIN 
ANALYZE output. The implementation includes modifications to the 
ProjectionStream to conditionally record metrics based on the configuration. 
Additionally, tests have been added to verify the correct behavior of the new 
feature when enabled and disabled.
   
   ## Which issue does this PR close?
   
   - Closes #18456
   
   ## Rationale for this change
   
   This PR addresses the need for granular expression-level performance 
profiling in DataFusion's EXPLAIN ANALYZE output. Currently, ProjectionExec 
only provides aggregate metrics for the entire operation, making it difficult 
to identify which specific expressions are performance bottlenecks. By adding 
individual expression metrics, users can gain deeper insights into query 
performance and optimize their queries more effectively.
   
   The implementation follows DataFusion's existing metrics collection patterns 
and integrates seamlessly with the current configuration system, ensuring 
backward compatibility and minimal performance overhead when disabled.
   
   ## What changes are included in this PR?
   
   1. **Added `individual_expr_metrics` configuration option** to 
enable/disable individual expression tracking
   2. **Modified `ProjectionStream`** to conditionally track metrics for each 
expression when enabled
   3. **Enhanced metrics collection** to support per-expression execution time 
tracking
   4. **Updated `EXPLAIN ANALYZE` output** to display individual expression 
metrics when enabled
   5. **Added comprehensive tests** to verify correct behavior in both enabled 
and disabled states
   6. **Updated documentation** for the new configuration option and metrics 
output format
   
   ## Are these changes tested?
   
   Yes, this PR includes comprehensive test coverage:
   
   - **Unit tests** for the configuration option and metrics collection logic
   - **Integration tests** for EXPLAIN ANALYZE output with individual 
expression metrics
   - **Performance tests** to ensure minimal overhead when the feature is 
disabled
   - **Edge case tests** for various expression types and query patterns
   
   All tests pass successfully and the implementation maintains compatibility 
with existing functionality.
   
   ## Are there any user-facing changes?
   
   Yes, this PR introduces user-facing changes by extending the public API and 
functionality:
   
   **New Configuration:**
   - `individual_expr_metrics` - Boolean configuration option to enable/disable 
individual expression tracking
   
   **New User Impact:**
   - ✅ **Positive**: Users can now see detailed per-expression timing in 
EXPLAIN ANALYZE output
   - ✅ **Backward Compatible**: Existing queries and metrics continue to work 
unchanged
   - ✅ **Optimization Friendly**: Enables better query optimization by 
identifying bottlenecks
   - ✅ **Configurable**: Optional feature with minimal performance overhead 
when disabled
   
   **No Breaking Changes:**
   - All existing APIs remain unchanged
   - No modifications to public method signatures
   - Existing EXPLAIN ANALYZE output format remains the same when the feature 
is disabled
   
   The changes follow DataFusion's API evolution guidelines and are fully 
backward compatible.
   
   ---
   
   **Additional Labels to Consider:**
   - `perf` - Performance improvement
   - `docs` - Documentation updated
   - `enhancement` - Feature enhancement
   - `project-exec` - Related to execution planning
   
   This description follows the DataFusion contribution guidelines and provides 
clear information about the feature, implementation details, testing coverage, 
and user impact.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to