[GitHub] [iceberg] simonsssu commented on a change in pull request #1515: Flink: Support sink when disable flink checkpoint disable

GitBox Fri, 09 Oct 2020 06:56:35 -0700


simonsssu commented on a change in pull request #1515:
URL: https://github.com/apache/iceberg/pull/1515#discussion_r502376629




##########
File path: 
flink/src/test/java/org/apache/iceberg/flink/sink/TestFlinkIcebergSink.java
##########
@@ -145,6 +147,38 @@ public void testWriteRowData() throws Exception {
     SimpleDataUtil.assertTableRows(tablePath, expectedRows);
   }
 
+  @Test
+  public void testWriteRowDataWithoutCheckpoint() throws Exception {
+    List<Row> rows = Lists.newArrayList(
+        Row.of(1, "hello"),
+        Row.of(2, "world"),
+        Row.of(3, "foo")
+    );
+
+    env = StreamExecutionEnvironment.getExecutionEnvironment()
+        .setParallelism(parallelism)
+        .setMaxParallelism(parallelism);
+
+    DataStream<RowData> dataStream = env.addSource(new 
NonCheckpointFiniteTestSource<>(rows), ROW_TYPE_INFO)
+        .map(CONVERTER::toInternal, 
RowDataTypeInfo.of(SimpleDataUtil.ROW_TYPE));
+
+    org.apache.flink.configuration.Configuration flinkConf = new 
org.apache.flink.configuration.Configuration();
+    flinkConf.setLong(FlinkSink.FLINK_ICEBERG_SINK_FLUSHINTERVAL, 100L);
+
+    FlinkSink.forRowData(dataStream)
+        .table(table)

Review comment:
       You are right, it's no need to add this. 

##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -103,10 +107,26 @@
   private static final ListStateDescriptor<SortedMap<Long, List<DataFile>>> 
STATE_DESCRIPTOR = buildStateDescriptor();
   private transient ListState<SortedMap<Long, List<DataFile>>> 
checkpointsState;
 
-  IcebergFilesCommitter(TableLoader tableLoader, Configuration hadoopConf, 
boolean replacePartitions) {
+  IcebergFilesCommitter(TableLoader tableLoader, Configuration hadoopConf, 
boolean replacePartitions,
+                        org.apache.flink.configuration.Configuration 
flinkConf) {
     this.tableLoader = tableLoader;
     this.hadoopConf = new SerializableConfiguration(hadoopConf);
     this.replacePartitions = replacePartitions;
+    this.flinkConf = flinkConf;
+    this.flushCommitInterval = 
flinkConf.getLong(FlinkSink.FLINK_ICEBERG_SINK_FLUSHINTERVAL,
+        FlinkSink.DEFAULT_FLINK_ICEBERG_SINK_FLUSHINTERVAL);
+  }
+
+  @Override
+  public void open() throws Exception {
+    super.open();
+    boolean isCheckpointEnabled = getRuntimeContext().isCheckpointingEnabled();
+    // If we don't enable checkpoint, we will use processingTimeSerice to do 
commit,
+    if (!isCheckpointEnabled) {
+      ProcessingTimeService processingTimeService = 
getRuntimeContext().getProcessingTimeService();
+      final long currentTimestamp = 
processingTimeService.getCurrentProcessingTime();
+      processingTimeService.registerTimer(currentTimestamp + 
flushCommitInterval, this);
+    }
   }
 
   @Override

Review comment:
       Yes initializeState will invoke even checkpoint disabled, and 
context.isRestored will be false.

##########
File path: 
flink/src/test/java/org/apache/iceberg/flink/sink/TestIcebergFilesCommitter.java
##########
@@ -528,7 +528,8 @@ private static TestOperatorFactory of(String tablePath) {
     @Override
     @SuppressWarnings("unchecked")
     public <T extends StreamOperator<Void>> T 
createStreamOperator(StreamOperatorParameters<Void> param) {
-      IcebergFilesCommitter committer = new 
IcebergFilesCommitter(TableLoader.fromHadoopTable(tablePath), CONF, false);
+      IcebergFilesCommitter committer = new 
IcebergFilesCommitter(TableLoader.fromHadoopTable(tablePath), CONF, false,

Review comment:
       You are right, I will add more UTs

##########
File path: flink/src/main/java/org/apache/iceberg/flink/sink/FlinkSink.java
##########
@@ -153,6 +158,11 @@ public Builder hadoopConf(Configuration newHadoopConf) {
       return this;
     }
 
+    public Builder flinkConf(org.apache.flink.configuration.Configuration 
config) {
+      this.flinkConf = config != null ? config : new 
org.apache.flink.configuration.Configuration();
+      return this;
+    }

Review comment:
       I think it's better here to pass the conf If we want to add more 
parameters to control the behaviors, although currently we only need an 
interval here.

##########
File path: 
flink/src/main/java/org/apache/iceberg/flink/sink/IcebergFilesCommitter.java
##########
@@ -73,7 +75,9 @@
   // TableLoader to load iceberg table lazily.
   private final TableLoader tableLoader;
   private final SerializableConfiguration hadoopConf;
+  private final org.apache.flink.configuration.Configuration flinkConf;
   private final boolean replacePartitions;
+  private final long flushCommitInterval;

Review comment:
       Yes, here I think it's better to use commitInterval rather than 
flushCommitInterval. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] simonsssu commented on a change in pull request #1515: Flink: Support sink when disable flink checkpoint disable

Reply via email to