[GitHub] [hudi] danny0405 commented on a change in pull request #3046: [HUDI-1984] Support independent flink hudi compaction function

GitBox Mon, 07 Jun 2021 20:36:56 -0700


danny0405 commented on a change in pull request #3046:
URL: https://github.com/apache/hudi/pull/3046#discussion_r647085775




##########
File path: hudi-flink/src/test/java/org/apache/hudi/sink/StreamWriteITCase.java
##########
@@ -137,6 +145,60 @@ public void testWriteToHoodie() throws Exception {
     TestData.checkWrittenFullData(tempFile, EXPECTED);
   }
 
+  @Test
+  public void testHoodieFlinkCompactor() throws Exception {
+    // Create hoodie table and insert into data.
+    EnvironmentSettings settings = 
EnvironmentSettings.newInstance().inBatchMode().build();
+    TableEnvironment tableEnv = TableEnvironmentImpl.create(settings);
+    tableEnv.getConfig().getConfiguration()
+            
.setInteger(ExecutionConfigOptions.TABLE_EXEC_RESOURCE_DEFAULT_PARALLELISM, 1);
+    Map<String, String> options = new HashMap<>();
+    options.put(FlinkOptions.PATH.key(), tempFile.getAbsolutePath());
+    options.put(FlinkOptions.TABLE_TYPE.key(), "MERGE_ON_READ");
+    String hoodieTableDDL = TestConfigurations.getCreateHoodieTableDDL("t1", 
options);
+    tableEnv.executeSql(hoodieTableDDL);
+    String insertInto = "insert into t1 values\n"
+            + "('id1','Danny',23,TIMESTAMP '1970-01-01 00:00:01','par1'),\n"
+            + "('id2','Stephen',33,TIMESTAMP '1970-01-01 00:00:02','par1'),\n"
+            + "('id3','Julian',53,TIMESTAMP '1970-01-01 00:00:03','par2'),\n"
+            + "('id4','Fabian',31,TIMESTAMP '1970-01-01 00:00:04','par2'),\n"
+            + "('id5','Sophia',18,TIMESTAMP '1970-01-01 00:00:05','par3'),\n"
+            + "('id6','Emma',20,TIMESTAMP '1970-01-01 00:00:06','par3'),\n"
+            + "('id7','Bob',44,TIMESTAMP '1970-01-01 00:00:07','par4'),\n"
+            + "('id8','Han',56,TIMESTAMP '1970-01-01 00:00:08','par4')";
+    TableResult tableResult = tableEnv.executeSql(insertInto);
+    TimeUnit.SECONDS.sleep(5);
+    tableResult.await();
+
+    // Make configuration and setAvroSchema.
+    StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
+    FlinkCompactionConfig cfg = new FlinkCompactionConfig();
+    cfg.path = tempFile.getAbsolutePath();
+    cfg.hoodieTableName = "t1";
+    Configuration conf = FlinkCompactionConfig.toCompactionConfig(cfg);
+    conf.setString(FlinkOptions.TABLE_TYPE.key(), "MERGE_ON_READ");
+    conf.setString(FlinkOptions.PARTITION_PATH_FIELD.key(), "partition");
+
+    // set table schema.
+    CompactionUtil.setAvroSchema(conf);
+
+    env.addSource(new CompactionCommitSource(conf))
+        .name("compaction_source")
+        .uid("uid_compaction_source")
+        .keyBy(event -> event.getOperation().hashCode())

Review comment:
       This is not a bottleneck, what partitioning strategy do you suggest ?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] danny0405 commented on a change in pull request #3046: [HUDI-1984] Support independent flink hudi compaction function

Reply via email to