watermelon12138 commented on PR #10143:
URL: https://github.com/apache/hudi/pull/10143#issuecomment-1837072821
@danny0405
Hello, Danny
I would like to ask that why data with the same primary key is written to
different log files (with the same FileId and different timestamps) in upsert
mode? As a result, I cannot write ut to test the LogIndex capability. My test
code is as follows:
` public void testHoodiePipelineBuilderSource() throws Exception {
//create a StreamExecutionEnvironment instance.
StreamExecutionEnvironment execEnv =
StreamExecutionEnvironment.getExecutionEnvironment();
execEnv.getConfig().disableObjectReuse();
execEnv.setParallelism(1);
// set up checkpoint interval
execEnv.enableCheckpointing(4000, CheckpointingMode.EXACTLY_ONCE);
execEnv.getCheckpointConfig().setMaxConcurrentCheckpoints(1);
Configuration conf =
TestConfigurations.getDefaultConf(tempFile.toURI().toString());
conf.setString(FlinkOptions.TABLE_NAME, "t1");
conf.setString(FlinkOptions.TABLE_TYPE, "MERGE_ON_READ");
conf.setString(FlinkOptions.INDEX_TYPE, "BUCKET");
conf.setInteger(FlinkOptions.BUCKET_INDEX_NUM_BUCKETS, 1);
conf.setBoolean(FlinkOptions.LOG_INDEX_ENABLE, true);
conf.setString(FlinkOptions.PRECOMBINE_FIELD, "ts");
conf.setString(FlinkOptions.RECORD_KEY_FIELD, "uuid");
conf.setBoolean(FlinkOptions.PRE_COMBINE, true);
conf.setString(FlinkOptions.OPERATION, "upsert");
// write 3 batches of data set
TestData.writeData(TestData.dataSetInsert(1), conf);
TestData.writeData(TestData.dataSetInsert(1), conf);`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]