tomstepp commented on code in PR #36720:
URL: https://github.com/apache/beam/pull/36720#discussion_r2548177316


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/WriteToDestinations.java:
##########
@@ -135,10 +216,33 @@ private PCollection<FileWriteResult> 
writeUntriggered(PCollection<KV<String, Row
             .apply("Group spilled rows by destination shard", 
GroupByKey.create())
             .apply(
                 "Write remaining rows to files",
-                new WriteGroupedRowsToFiles(catalogConfig, 
dynamicDestinations, filePrefix));
+                new WriteGroupedRowsToFiles(
+                    catalogConfig, dynamicDestinations, filePrefix, 
DEFAULT_MAX_BYTES_PER_FILE));
 
     return PCollectionList.of(writeUngroupedResult.getWrittenFiles())
         .and(writeGroupedResult)
         .apply("Flatten Written Files", Flatten.pCollections());
   }
+
+  /**
+   * A SerializableFunction to estimate the byte size of a Row for bundling 
purposes. This is a
+   * heuristic that avoids the high cost of encoding each row with a Coder.
+   */
+  private static class RowSizer implements SerializableFunction<KV<String, 
Row>, Integer> {
+    @Override
+    public Integer apply(KV<String, Row> element) {
+      Row row = element.getValue();
+      int size = 0;
+      for (Object value : row.getValues()) {
+        if (value instanceof String) {
+          size += Utf8.encodedLength((String) value);
+        } else if (value instanceof byte[]) {
+          size += ((byte[]) value).length;
+        } else {
+          size += 8; // Approximation for non-string/byte fields
+        }
+      }

Review Comment:
   Done. Added calculation for lists, maps, and nested rows. And added null 
values are zero bytes



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to