gemini-code-assist[bot] commented on code in PR #39155:
URL: https://github.com/apache/beam/pull/39155#discussion_r3493575765


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/AssignDestinationsAndPartitions.java:
##########
@@ -84,6 +84,7 @@ static class AssignDoFn extends DoFn<Row, KV<Row, Row>> {
     private transient @MonotonicNonNull Map<String, PartitionKey> 
partitionKeys;
     private transient @MonotonicNonNull Map<String, BeamRowWrapper> wrappers;
     private transient @MonotonicNonNull Map<String, Instant> lastRefreshTimes;
+    private transient @MonotonicNonNull Map<String, Row> keyCache;

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Using a single flat map with a concatenated string key (`tableIdentifier + 
"|" + partitionPath`) requires allocating a new `String` object for every 
single input element, even on cache hits. Since this DoFn processes millions of 
rows, this overhead can be avoided by using a nested map structure: 
`Map<String, Map<String, Row>>`.
   
   ```suggestion
       private transient @MonotonicNonNull Map<String, Map<String, Row>> 
keyCache;
   ```



##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/AssignDestinationsAndPartitions.java:
##########
@@ -175,8 +177,13 @@ public void processElement(
 
       String partitionPath = partitionKey.toPath();
 
-      Row destAndPartition =
-          Row.withSchema(OUTPUT_SCHEMA).addValues(tableIdentifier, 
partitionPath).build();
+      String cacheKey = tableIdentifier + "|" + partitionPath;
+      Row destAndPartition = checkStateNotNull(keyCache).get(cacheKey);
+      if (destAndPartition == null) {
+        destAndPartition =
+            Row.withSchema(OUTPUT_SCHEMA).addValues(tableIdentifier, 
partitionPath).build();
+        keyCache.put(cacheKey, destAndPartition);
+      }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   Instead of concatenating `tableIdentifier` and `partitionPath` into a new 
`String` on every element, use a nested map lookup. This avoids any object 
allocation on cache hits, which is the common case.
   
   ```java
         Map<String, Map<String, Row>> cache = checkStateNotNull(keyCache);
         Map<String, Row> tableCache = cache.get(tableIdentifier);
         if (tableCache == null) {
           tableCache = new HashMap<>();
           cache.put(tableIdentifier, tableCache);
         }
         Row destAndPartition = tableCache.get(partitionPath);
         if (destAndPartition == null) {
           destAndPartition =
               Row.withSchema(OUTPUT_SCHEMA).addValues(tableIdentifier, 
partitionPath).build();
           tableCache.put(partitionPath, destAndPartition);
         }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to