xiearthur opened a new issue, #12661:
URL: https://github.com/apache/hudi/issues/12661
Title: [HUDI-XXX][Flink] Unable to read new data in streaming mode with both
earliest and specific timestamp
**Describe the problem you faced**
When using Flink to read a Hudi COW table in streaming mode, neither
"earliest" nor specific timestamp can read new data written after the Flink job
starts. The streaming job only reads data up to its start time.
**To Reproduce**
```java
Map<String, String> options = new HashMap<>();
options.put(FlinkOptions.PATH.key(), basePath + tableName);
options.put(FlinkOptions.TABLE_TYPE.key(),
HoodieTableType.COPY_ON_WRITE.name());
options.put(FlinkOptions.READ_AS_STREAMING.key(), "true");
// Case 1: earliest - only reads historical data, no new data
options.put(FlinkOptions.READ_START_COMMIT.key(), "earliest");
options.put(FlinkOptions.READ_STREAMING_CHECK_INTERVAL.key(), "5");
// Case 2: specific timestamp - same behavior
// options.put(FlinkOptions.READ_START_COMMIT.key(), "20240116000000");
HoodiePipeline.Builder builder = HoodiePipeline.builder(tableName)
.options(options);
DataStream<RowData> rowDataDS = builder.source(env);
```
**Expected behavior**
The streaming job should continuously read new data written after job
starts, regardless of using "earliest" or specific timestamp.
**Environment Description**
* Hudi version: 0.14.0
* Flink version: 1.16.0
* Hadoop version: 3.1.0
* Storage: HDFS
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]