SteNicholas commented on code in PR #7903:
URL: https://github.com/apache/hudi/pull/7903#discussion_r1101042060
##########
hudi-flink-datasource/hudi-flink/src/test/java/org/apache/hudi/table/ITTestHoodieDataSource.java:
##########
@@ -359,6 +360,39 @@ void testAppendWriteReadSkippingClustering() throws
Exception {
assertRowsEquals(rows, TestData.DATA_SET_SOURCE_INSERT_LATEST_COMMIT);
}
+ @Test
+ void testAppendWriteWithClusteringBatchRead() throws Exception {
+ // create filesystem table named source
+ String createSource = TestConfigurations.getFileSourceDDL("source", 4);
+ streamTableEnv.executeSql(createSource);
+
+ String hoodieTableDDL = sql("t1")
+ .option(FlinkOptions.PATH, tempFile.getAbsolutePath())
+ .option(FlinkOptions.OPERATION, "insert")
+ .option(FlinkOptions.READ_STREAMING_SKIP_CLUSTERING, true)
+ .option(FlinkOptions.CLUSTERING_SCHEDULE_ENABLED,true)
+ .option(FlinkOptions.CLUSTERING_ASYNC_ENABLED, true)
+ .option(FlinkOptions.CLUSTERING_DELTA_COMMITS,2)
+ .option(FlinkOptions.CLUSTERING_TASKS, 1)
+ .option(FlinkOptions.CLEAN_RETAIN_COMMITS, 1)
+ .end();
+ streamTableEnv.executeSql(hoodieTableDDL);
+ String insertInto = "insert into t1 select * from source";
+ execInsertSql(streamTableEnv, insertInto);
+
+ streamTableEnv.getConfig().getConfiguration()
+ .setBoolean("table.dynamic-table-options.enabled", true);
+ final String query = String.format("select * from t1/*+
options('read.start-commit'='%s')*/",
+ FlinkOptions.START_COMMIT_EARLIEST);
+
+ List<RowData> expected = new ArrayList<>();
+ expected.addAll(TestData.DATA_SET_SOURCE_INSERT_FIRST_COMMIT);
+ expected.addAll(TestData.DATA_SET_SOURCE_INSERT_LATEST_COMMIT);
+ List<Row> rows = execSelectSql(streamTableEnv, query, 10);
Review Comment:
@hbgstc123, the reason of the above suggestion is that `execSelectSql ` will
start a Flink job to collect the data of the table t1 and useful for stream
reading, and the batch reading only uses the `streamTableEnv.sqlQuery` to get
the data of the table t1. Otherwise the IT case would run failed. You could
locally run this IT case.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]