gemini-code-assist[bot] commented on code in PR #39157:
URL: https://github.com/apache/beam/pull/39157#discussion_r3493582825


##########
sdks/java/io/iceberg/src/main/java/org/apache/beam/sdk/io/iceberg/RecordWriterManager.java:
##########
@@ -163,7 +163,10 @@ boolean write(Record record) {
 
       @Nullable RecordWriter writer = 
writers.getIfPresent(routingPartitionKey);
       if (writer == null && openWriters >= maxNumWriters) {
-        return false;
+        writers.cleanUp();
+        if (openWriters >= maxNumWriters) {
+          return false;
+        }
       }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Calling `writers.cleanUp()` on every single write attempt that results in a 
spill can introduce significant CPU overhead and lock contention, especially 
when processing a large volume of records for partitions that exceed 
`maxNumWriters`. Since Guava's `cleanUp()` acquires locks on all cache 
segments, calling it repeatedly in a tight loop for every spilled record is 
inefficient.\n\nTo optimize this, consider rate-limiting the `cleanUp()` calls 
(e.g., at most once every second) by tracking the last cleanup time using a 
timestamp field (e.g., `lastCleanupNanos`) in `DestinationState`.\n\nNote: You 
will need to define `private long lastCleanupNanos = 0L;` as a field in the 
`DestinationState` class for this suggestion to compile.
   
   ```java
         if (writer == null && openWriters >= maxNumWriters) {\n        long 
now = System.nanoTime();\n        if (now - lastCleanupNanos > 1000000000L) {\n 
         writers.cleanUp();\n          lastCleanupNanos = now;\n        }\n     
   if (openWriters >= maxNumWriters) {\n          return false;\n        }\n    
  }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to