Guosmilesmile commented on code in PR #14837:
URL: https://github.com/apache/iceberg/pull/14837#discussion_r2619614998
##########
core/src/main/java/org/apache/iceberg/actions/SizeBasedFileRewritePlanner.java:
##########
@@ -107,6 +107,15 @@ public abstract class SizeBasedFileRewritePlanner<
public static final long MAX_FILE_GROUP_SIZE_BYTES_DEFAULT = 100L * 1024 *
1024 * 1024; // 100 GB
+ /**
+ * This option controls the largest count of data that should be rewritten
in a single file group.
+ * It helps with breaking down the rewriting of very large partitions which
may not be rewritable
+ * otherwise due to the resource constraints of the cluster.
+ */
+ public static final String MAX_FILE_GROUP_COUNT = "max-file-group-count";
Review Comment:
Fix it.
##########
core/src/main/java/org/apache/iceberg/util/BinPacking.java:
##########
@@ -36,38 +36,51 @@ public static class ListPacker<T> {
private final long targetWeight;
private final int lookback;
private final boolean largestBinFirst;
+ private final long maxSize;
public ListPacker(long targetWeight, int lookback, boolean
largestBinFirst) {
+ this(targetWeight, lookback, largestBinFirst, Long.MAX_VALUE);
+ }
+
+ public ListPacker(long targetWeight, int lookback, boolean
largestBinFirst, long maxSize) {
this.targetWeight = targetWeight;
this.lookback = lookback;
this.largestBinFirst = largestBinFirst;
+ this.maxSize = maxSize;
}
public List<List<T>> packEnd(List<T> items, Function<T, Long> weightFunc) {
return Lists.reverse(
ImmutableList.copyOf(
Iterables.transform(
new PackingIterable<>(
- Lists.reverse(items), targetWeight, lookback,
weightFunc, largestBinFirst),
+ Lists.reverse(items),
+ targetWeight,
+ lookback,
+ weightFunc,
+ largestBinFirst,
+ maxSize),
Review Comment:
Fix it.
##########
docs/docs/flink-maintenance.md:
##########
@@ -218,6 +218,7 @@ env.execute("Table Maintenance Job");
| `partialProgressMaxCommits(int)` | Maximum commits allowed for partial
progress when partialProgressEnabled is true | 10 | int |
| `maxRewriteBytes(long)` | Maximum bytes to rewrite per execution |
Long.MAX_VALUE | long |
| `filter(Expression)` | Filter expression for selecting files to rewrite |
Expressions.alwaysTrue() | Expression |
+| `maxFileGroupCount(long)` | Maximum total count of a file group
| Long.MAX_VALUE | long |
Review Comment:
Ok
##########
flink/v2.1/flink/src/main/java/org/apache/iceberg/flink/maintenance/api/RewriteDataFiles.java:
##########
@@ -181,6 +181,18 @@ public Builder maxFileGroupSizeBytes(long
maxFileGroupSizeBytes) {
return this;
}
+ /**
+ * Configures the max file count for rewriting. See {@link
+ * SizeBasedFileRewritePlanner#MAX_FILE_GROUP_COUNT} for more details.
+ *
+ * @param maxFileGroupCount file count for rewrite
+ */
+ public Builder maxFileGroupCount(long maxFileGroupCount) {
Review Comment:
Fix it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]