WencongLiu commented on code in PR #3384:
URL: https://github.com/apache/paimon/pull/3384#discussion_r1616650922
##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/sink/FlinkSinkBuilder.java:
##########
@@ -119,8 +133,86 @@ public FlinkSinkBuilder inputBounded(boolean bounded) {
return this;
}
+ /** Set the table sort info. */
+ public FlinkSinkBuilder setTableSortInfo(
+ String sortColumnsString,
+ String sortStrategy,
+ boolean sortInCluster,
+ int sampleFactor) {
+ // 1. The table sort will be ignored if the sort columns are not
specified.
+ if (sortColumnsString == null || sortColumnsString.isEmpty()) {
+ return this;
+ }
+ // 2. Check the table type.
+ checkState(
+ table.bucketMode().equals(BUCKET_UNAWARE),
+ "Clustering only supports bucket unaware table without primary
keys.");
+ // 3. Check the sort columns.
+ List<String> sortColumns = Arrays.asList(sortColumnsString.split(","));
+ List<String> fieldNames = table.schema().fieldNames();
+ checkState(
+ new HashSet<>(fieldNames).containsAll(new
HashSet<>(sortColumns)),
+ String.format(
+ "Field names %s should contains all clustering column
names %s.",
+ fieldNames, sortColumns));
+ // 4. Check the execution mode.
+ checkState(input != null, "The input stream should be specified
earlier.");
+ if (boundedInput == null) {
+ boundedInput = !FlinkSink.isStreaming(input);
+ }
+ checkState(boundedInput, "The clustering should be executed under
batch mode.");
Review Comment:
@JingsongLi
Good point, ignoring clustering in stream mode is a sensible design. This
avoids the need for users to manually adjust the table configuration under
streaming mode.
@xintongsong
1. I've removed the check and added releated warning log.
2. I've added the limitations of table type and batch mode in configuration
description. I've also added a commit to introduce the clustering feature in
paimon docs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]