[I] [SUPPORT]Unable to read new data in streaming mode with both earliest and specific timestamp [hudi]

via GitHub Thu, 16 Jan 2025 21:35:14 -0800


xiearthur opened a new issue, #12661:
URL: https://github.com/apache/hudi/issues/12661


   Title: [HUDI-XXX][Flink] Unable to read new data in streaming mode with both 
earliest and specific timestamp
   
   **Describe the problem you faced**  
   When using Flink to read a Hudi COW table in streaming mode, neither 
"earliest" nor specific timestamp can read new data written after the Flink job 
starts. The streaming job only reads data up to its start time.
   
   **To Reproduce**
   ```java
   Map<String, String> options = new HashMap<>();
   options.put(FlinkOptions.PATH.key(), basePath + tableName);
   options.put(FlinkOptions.TABLE_TYPE.key(), 
HoodieTableType.COPY_ON_WRITE.name());
   options.put(FlinkOptions.READ_AS_STREAMING.key(), "true");
   
   // Case 1: earliest - only reads historical data, no new data
   options.put(FlinkOptions.READ_START_COMMIT.key(), "earliest");
   options.put(FlinkOptions.READ_STREAMING_CHECK_INTERVAL.key(), "5"); 
   
   // Case 2: specific timestamp - same behavior
   // options.put(FlinkOptions.READ_START_COMMIT.key(), "20240116000000");
   
   HoodiePipeline.Builder builder = HoodiePipeline.builder(tableName)
       .options(options);
   DataStream<RowData> rowDataDS = builder.source(env);
   ```
   
   **Expected behavior**
   The streaming job should continuously read new data written after job 
starts, regardless of using "earliest" or specific timestamp.
   
   **Environment Description**
   * Hudi version: 0.14.0  
   * Flink version: 1.16.0
   * Hadoop version: 3.1.0
   * Storage: HDFS


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[I] [SUPPORT]Unable to read new data in streaming mode with both earliest and specific timestamp [hudi]

Reply via email to