rdblue commented on code in PR #7688:
URL: https://github.com/apache/iceberg/pull/7688#discussion_r1206085137


##########
core/src/main/java/org/apache/iceberg/BaseScan.java:
##########
@@ -256,4 +265,95 @@ private static Schema 
lazyColumnProjection(TableScanContext context, Schema sche
   public ThisT metricsReporter(MetricsReporter reporter) {
     return newRefinedScan(table(), schema(), context().reportWith(reporter));
   }
+
+  private Optional<Long> adaptiveSplitSize(long tableSplitSize) {
+    if (!PropertyUtil.propertyAsBoolean(
+        table.properties(),
+        TableProperties.ADAPTIVE_SPLIT_PLANNING,
+        TableProperties.ADAPTIVE_SPLIT_PLANNING_DEFAULT)) {
+      return Optional.empty();
+    }
+
+    int minParallelism =
+        PropertyUtil.propertyAsInt(
+            table.properties(),
+            TableProperties.SPLIT_MIN_PARALLELISM,
+            TableProperties.SPLIT_MIN_PARALLELISM_DEFAULT);
+
+    Preconditions.checkArgument(minParallelism > 0, "Minimum parallelism must 
be a positive value");
+
+    Snapshot snapshot =
+        Stream.of(context.snapshotId(), context.toSnapshotId())
+            .filter(Objects::nonNull)
+            .map(table::snapshot)
+            .findFirst()
+            .orElseGet(table::currentSnapshot);
+
+    if (snapshot == null || snapshot.summary() == null) {
+      return Optional.empty();
+    }
+
+    Map<String, String> summary = snapshot.summary();
+    long totalFiles =
+        PropertyUtil.propertyAsLong(summary, 
SnapshotSummary.TOTAL_DATA_FILES_PROP, 0);
+    long totalSize = PropertyUtil.propertyAsLong(summary, 
SnapshotSummary.TOTAL_FILE_SIZE_PROP, 0);
+
+    if (totalFiles <= 0 || totalSize <= 0) {
+      return Optional.empty();
+    }
+
+    if (totalFiles > minParallelism && totalSize >= tableSplitSize * 
minParallelism) {

Review Comment:
   Do you think we would want to use partition stats instead of total size / 
total files in the future?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to