featzhang opened a new pull request, #27614:
URL: https://github.com/apache/flink/pull/27614

   # [table-planner] Enhance EXPLAIN output to display detailed estimated 
rowcount and cost information
   
   ## What is the purpose of the change
   
   Currently, when executing EXPLAIN statements in Flink SQL, detailed cost 
information (including rowcount and cumulative cost breakdown) is only 
displayed when using `ExplainDetail.ESTIMATED_COST` (which sets the explain 
level to `ALL_ATTRIBUTES`). In the default `EXPPLAN_ATTRIBUTES` mode, this 
valuable information is not shown, making it harder for users to understand 
query optimizer decisions and identify performance bottlenecks.
   
   This PR enhances the EXPLAIN output to display detailed estimated rowcount 
and cumulative cost information at the `EXPPLAN_ATTRIBUTES` level as well, 
improving query plan readability and facilitating performance analysis.
   
   ## Brief change log
   
   - Modified `RelTreeWriterImpl.scala` to display rowcount and cumulative cost 
at both `ALL_ATTRIBUTES` and `EXPPLAN_ATTRIBUTES` explain levels
   - Added null checks for rowcount and cost values to improve error handling
   - Cost details now consistently show FlinkCost breakdown in format: `{rows, 
cpu, io, network, memory}`
   
   ## Verifying this change
   
   This change can be verified by:
   
   1. **Manual Testing**: Execute any EXPLAIN statement in Flink SQL and 
observe that rowcount and cost information is now displayed by default:
      ```sql
      EXPLAIN SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
      ```
      
      **Before**: Only operator names and attributes shown
      **After**: Each operator line includes `: rowcount = <value>, cumulative 
cost = {<cost details>}`
   
   2. **Existing Tests**: Run existing explain tests to ensure compatibility:
      ```bash
      mvn test -pl flink-table/flink-table-planner -Dtest=ExplainTest
      ```
   
   ## Does this pull request potentially affect one of the following parts
   
   - Dependencies (does it add or upgrade a dependency): **No**
   - The public API, i.e., is any changed class annotated with 
`@Public(Evolving)`: **No**
   - The serializers: **No**
   - The runtime per-record code paths (performance sensitive): **No**
   - Anything that affects deployment or recovery: JobManager (and its 
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **No**
   - The S3 file system connector: **No**
   
   ## Documentation
   
   - Does this pull request introduce a new feature: **No** (Enhancement)
   - If yes, how is the feature documented: **Not applicable** (Existing 
EXPLAIN functionality)
   
   ## Additional Notes
   
   The change is minimal and focused on improving the user experience when 
analyzing query plans. The FlinkCost class already provides a comprehensive 
`toString()` method that formats all five cost dimensions (rows, cpu, io, 
network, memory), which is now properly exposed to users at the default explain 
level.
   
   This enhancement will help users:
   - Better understand query optimizer cost estimations
   - Identify expensive operations more easily
   - Perform more effective performance tuning
   - Debug query plan issues with more context
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to