reuvenlax commented on a change in pull request #14852:
URL: https://github.com/apache/beam/pull/14852#discussion_r638318157



##########
File path: 
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/GroupIntoBatchesOverride.java
##########
@@ -87,14 +94,22 @@ private BatchGroupIntoBatches(long batchSize) {
                   new DoFn<KV<K, Iterable<V>>, KV<K, Iterable<V>>>() {
                     @ProcessElement
                     public void process(ProcessContext c) {
-                      // Iterators.partition lazily creates the partitions as 
they are accessed
-                      // allowing it to partition very large iterators.
-                      Iterator<List<V>> iterator =
-                          
Iterators.partition(c.element().getValue().iterator(), (int) batchSize);
-
-                      // Note that GroupIntoBatches only outputs when the 
batch is non-empty.
-                      while (iterator.hasNext()) {
-                        c.output(KV.of(c.element().getKey(), iterator.next()));
+                      List<V> currentBatch = Lists.newArrayList();
+                      long batchSizeBytes = 0;
+                      for (V element : c.element().getValue()) {
+                        currentBatch.add(element);
+                        if (weigher != null) {
+                          batchSizeBytes += weigher.apply(element);
+                        }
+                        if (currentBatch.size() == maxBatchSizeElements
+                            || (maxBatchSizeBytes != Long.MAX_VALUE
+                                && batchSizeBytes >= maxBatchSizeBytes)) {
+                          c.output(KV.of(c.element().getKey(), currentBatch));
+                          // Call clear() since that allows us to reuse the 
array memory for
+                          // subsequent batches.
+                          currentBatch.clear();
+                          batchSizeBytes = 0;
+                        }
                       }

Review comment:
       Yes - I noticed that as well and pushed a fix.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to