[I] [SUPPORT] READ_START_COMMIT with specific timestamp unable to read new data in streaming mode [hudi]

via GitHub Thu, 16 Jan 2025 21:26:45 -0800


xiearthur opened a new issue, #12660:
URL: https://github.com/apache/hudi/issues/12660


   
   **Describe the problem you faced**
   When using Flink to read a Hudi COW table in streaming mode, setting 
READ_START_COMMIT shows different behaviors:
   - With "earliest": can continuously read both historical and new data
   - With specific timestamp: only reads data up to Flink job start time, 
missing new data written after that
   
   **To Reproduce**
   ```java
   Map<String, String> options = new HashMap<>();
   options.put(FlinkOptions.PATH.key(), basePath + tableName);
   options.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.COPY_ON_WRITE.name());
   options.put(FlinkOptions.READ_AS_STREAMING.key(), "true");
   
   // Case 1: Works for continuous streaming but reads all history
   options.put(FlinkOptions.READ_START_COMMIT.key(), "earliest");
   
   // Case 2: Only reads data up to job start time
   // options.put(FlinkOptions.READ_START_COMMIT.key(), "20240116000000");
   
   HoodiePipeline.Builder builder = HoodiePipeline.builder(tableName)
       .options(options);
   DataStream<RowData> rowDataDS = builder.source(env);
   ```
   
   **Expected behavior**
   With specific READ_START_COMMIT timestamp, the streaming job should:
   1. Start reading from the specified commit timestamp
   2. Continue receiving new data written after job starts
   
   **Environment Description**
   * Hudi version: 0.14.0
   * Flink version: 1.16.0
   * Hadoop version: 3.1.0
   * Storage: HDFS


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT] READ_START_COMMIT with specific timestamp unable to read new data in streaming mode [hudi]

Reply via email to