featzhang opened a new pull request, #27614:
URL: https://github.com/apache/flink/pull/27614
# [table-planner] Enhance EXPLAIN output to display detailed estimated
rowcount and cost information
## What is the purpose of the change
Currently, when executing EXPLAIN statements in Flink SQL, detailed cost
information (including rowcount and cumulative cost breakdown) is only
displayed when using `ExplainDetail.ESTIMATED_COST` (which sets the explain
level to `ALL_ATTRIBUTES`). In the default `EXPPLAN_ATTRIBUTES` mode, this
valuable information is not shown, making it harder for users to understand
query optimizer decisions and identify performance bottlenecks.
This PR enhances the EXPLAIN output to display detailed estimated rowcount
and cumulative cost information at the `EXPPLAN_ATTRIBUTES` level as well,
improving query plan readability and facilitating performance analysis.
## Brief change log
- Modified `RelTreeWriterImpl.scala` to display rowcount and cumulative cost
at both `ALL_ATTRIBUTES` and `EXPPLAN_ATTRIBUTES` explain levels
- Added null checks for rowcount and cost values to improve error handling
- Cost details now consistently show FlinkCost breakdown in format: `{rows,
cpu, io, network, memory}`
## Verifying this change
This change can be verified by:
1. **Manual Testing**: Execute any EXPLAIN statement in Flink SQL and
observe that rowcount and cost information is now displayed by default:
```sql
EXPLAIN SELECT * FROM table1 JOIN table2 ON table1.id = table2.id;
```
**Before**: Only operator names and attributes shown
**After**: Each operator line includes `: rowcount = <value>, cumulative
cost = {<cost details>}`
2. **Existing Tests**: Run existing explain tests to ensure compatibility:
```bash
mvn test -pl flink-table/flink-table-planner -Dtest=ExplainTest
```
## Does this pull request potentially affect one of the following parts
- Dependencies (does it add or upgrade a dependency): **No**
- The public API, i.e., is any changed class annotated with
`@Public(Evolving)`: **No**
- The serializers: **No**
- The runtime per-record code paths (performance sensitive): **No**
- Anything that affects deployment or recovery: JobManager (and its
components), Checkpointing, Kubernetes/Yarn, ZooKeeper: **No**
- The S3 file system connector: **No**
## Documentation
- Does this pull request introduce a new feature: **No** (Enhancement)
- If yes, how is the feature documented: **Not applicable** (Existing
EXPLAIN functionality)
## Additional Notes
The change is minimal and focused on improving the user experience when
analyzing query plans. The FlinkCost class already provides a comprehensive
`toString()` method that formats all five cost dimensions (rows, cpu, io,
network, memory), which is now properly exposed to users at the default explain
level.
This enhancement will help users:
- Better understand query optimizer cost estimations
- Identify expensive operations more easily
- Perform more effective performance tuning
- Debug query plan issues with more context
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]