Jiabao-Sun commented on code in PR #3510:
URL: https://github.com/apache/flink-cdc/pull/3510#discussion_r1705082918
##########
flink-cdc-connect/flink-cdc-source-connectors/flink-cdc-base/src/main/java/org/apache/flink/cdc/connectors/base/source/assigner/splitter/ChunkSplitter.java:
##########
@@ -28,6 +29,36 @@
@Experimental
public interface ChunkSplitter {
+ /**
+ * Called to open the chunk splitter to acquire any resources, like
threads or jdbc connections.
+ */
+ void open();
+
/** Generates all snapshot splits (chunks) for the give data collection. */
- Collection<SnapshotSplit> generateSplits(TableId tableId);
+ Collection<SnapshotSplit> generateSplits(TableId tableId) throws Exception;
+
+ /** Get whether the splitter has more chunks for current table. */
+ boolean hasNextChunk();
+
+ /**
+ * Creates a snapshot of the state of this chunk splitter, to be stored in
a checkpoint.
+ *
+ * <p>This method takes the ID of the checkpoint for which the state is
snapshotted. Most
+ * implementations should be able to ignore this parameter, because for
the contents of the
+ * snapshot, it doesn't matter for which checkpoint it gets created. This
parameter can be
+ * interesting for source connectors with external systems where those
systems are themselves
+ * aware of checkpoints; for example in cases where the enumerator
notifies that system about a
+ * specific checkpoint being triggered.
+ *
+ * @param checkpointId The ID of the checkpoint for which the snapshot is
created.
+ * @return an object containing the state of the split enumerator.
+ */
+ ChunkSplitterState snapshotState(long checkpointId);
+
+ TableId getCurrentSplittingTableId();
+
+ /**
+ * Called to open the chunk splitter to release any resources, like
threads or jdbc connections.
+ */
+ void close() throws Exception;
Review Comment:
Thanks @loserwang1024 for this great work.
I have a small suggestion for the `ChunkSplitter` interface.
The methods `open, hasNextChunk, close, getCurrentSplittingTableId, close`
are more like those of an `Iterator` or `Cursor`. Perhaps
`ChunkSplitter.generateSplits` could be designed to open a `Cursor`, which can
unify the iteration logic and support both one-time splitting and partial
splitting. By using cursors, we might be able to support simultaneous spliting
of multiple tables.
However, maintaining the state might become complex, as we need to keep
track of the state of all open cursors.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]