featzhang opened a new pull request, #27611:
URL: https://github.com/apache/flink/pull/27611

   ### What is the purpose of the change
   
   This change enhances the EXPLAIN plan output for TableScan nodes by 
explicitly displaying watermark specifications. Currently, watermark logic is 
difficult to identify in EXPLAIN plans, which creates challenges for streaming 
users trying to understand and debug watermark strategies.
   
   **Before:**
   ```
   TableScan(table=[[default_catalog, default_database, orders]], 
fields=[user_id, order_time])
   ```
   
   **After:**
   ```
   TableScan:
     table: [[default_catalog, default_database, orders]]
     fields: user_id, order_time
     watermark: order_time - order_time - INTERVAL '5' SECOND
   ```
   
   This improvement makes watermark strategies immediately visible in query 
plans, helping streaming users:
   - Quickly understand watermark configurations
   - Debug late data handling issues
   - Verify watermark logic without examining table DDL
   - Improve overall development and troubleshooting experience
   
   ### Brief change log
   
   - Modified `FlinkLogicalTableSourceScan.explainTerms()` to extract and 
display watermark specifications
   - Added watermark information retrieval from `ResolvedSchema` via 
`TableSourceTable`
   - Enhanced explain output to show rowtime attribute and watermark expression 
in a readable format
   - Watermark is displayed only when present, maintaining backward 
compatibility for non-streaming tables
   
   ### Verifying this change
   
   This change can be verified by:
   
   1. **Compilation**: The module compiles successfully without errors
      ```bash
      ./mvnw clean spotless:apply install -DskipTests -Pfast -pl 
flink-table/flink-table-planner
      ```
   
   2. **Create table with watermark**: 
      ```sql
      CREATE TABLE orders (
        user_id INT,
        order_time TIMESTAMP(3),
        WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
      ) WITH (...);
      ```
   
   3. **Verify EXPLAIN output**:
      ```sql
      EXPLAIN SELECT * FROM orders;
      ```
      The output should display watermark information in the TableScan node.
   
   4. **Test without watermark**: Verify that tables without watermarks still 
work normally
      ```sql
      CREATE TABLE batch_table (id INT, name STRING) WITH (...);
      EXPLAIN SELECT * FROM batch_table;
      ```
   
   ### Does this pull request potentially affect
   
   - **Dependencies**: No
   - **The public API**: No (only changes internal explain output format)
   - **The serializers**: No
   - **The runtime per-record code paths**: No (only affects plan display)
   - **Anything that affects deployment or recovery**: No
   
   ### Documentation
   
   - This change enhances the readability of query plan output for streaming 
jobs
   - No user-facing API documentation changes required
   - The improvement is visible in EXPLAIN plan output
   - Maintains backward compatibility by only enhancing display format, not 
changing RelNode structure
   - Particularly valuable for streaming processing users working with event 
time and watermarks
   
   ---
   
   ## 中文版本
   
   ### 变更目的
   
   此变更通过在 TableScan 节点中显式展示 watermark 规范来增强 EXPLAIN 计划输出。当前的 EXPLAIN 输出很难看出 
watermark 逻辑,这给尝试理解和调试 watermark 策略的流处理用户带来了挑战。
   
   **修改前:**
   ```
   TableScan(table=[[default_catalog, default_database, orders]], 
fields=[user_id, order_time])
   ```
   
   **修改后:**
   ```
   TableScan:
     table: [[default_catalog, default_database, orders]]
     fields: user_id, order_time
     watermark: order_time - order_time - INTERVAL '5' SECOND
   ```
   
   这个改进使得 watermark 策略在查询计划中立即可见,帮助流处理用户:
   - 快速理解 watermark 配置
   - 调试迟到数据处理问题
   - 在不查看表 DDL 的情况下验证 watermark 逻辑
   - 改善整体开发和故障排查体验
   
   ### 主要变更列表
   
   - 修改 `FlinkLogicalTableSourceScan.explainTerms()` 以提取和显示 watermark 规范
   - 通过 `TableSourceTable` 从 `ResolvedSchema` 添加 watermark 信息检索
   - 增强 explain 输出,以可读格式显示 rowtime 属性和 watermark 表达式
   - 仅在存在 watermark 时显示,保持对非流式表的向后兼容性
   
   ### 验证此变更
   
   可以通过以下方式验证此变更:
   
   1. **编译验证**: 模块成功编译,无错误
      ```bash
      ./mvnw clean spotless:apply install -DskipTests -Pfast -pl 
flink-table/flink-table-planner
      ```
   
   2. **创建带 watermark 的表**:
      ```sql
      CREATE TABLE orders (
        user_id INT,
        order_time TIMESTAMP(3),
        WATERMARK FOR order_time AS order_time - INTERVAL '5' SECOND
      ) WITH (...);
      ```
   
   3. **验证 EXPLAIN 输出**:
      ```sql
      EXPLAIN SELECT * FROM orders;
      ```
      输出应该在 TableScan 节点中显示 watermark 信息。
   
   4. **测试无 watermark 情况**: 验证没有 watermark 的表仍然正常工作
      ```sql
      CREATE TABLE batch_table (id INT, name STRING) WITH (...);
      EXPLAIN SELECT * FROM batch_table;
      ```
   
   ### 此拉取请求可能影响的方面
   
   - **依赖项**: 否
   - **公共 API**: 否(仅更改内部 explain 输出格式)
   - **序列化器**: 否
   - **运行时每条记录代码路径**: 否(仅影响计划显示)
   - **任何影响部署或恢复的内容**: 否
   
   ### 文档
   
   - 此变更增强了流式作业查询计划输出的可读性
   - 不需要更改面向用户的 API 文档
   - 改进在 EXPLAIN 计划输出中可见
   - 通过仅增强显示格式而不更改 RelNode 结构来维持向后兼容性
   - 对使用事件时间和 watermark 的流处理用户特别有价值
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to