jsancio commented on code in PR #15007:
URL: https://github.com/apache/kafka/pull/15007#discussion_r1427486933


##########
metadata/src/main/java/org/apache/kafka/metadata/migration/BufferingBatchConsumer.java:
##########
@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements. See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License. You may obtain a copy of the License at
+ *
+ *    http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.kafka.metadata.migration;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.function.Consumer;
+
+/**
+ * A record batch consumer that merges incoming batches into batches of a 
minimum a given size. It does so
+ * by buffering the records into an array that is later flushed to a 
downstream consumer. Batches consumed
+ * by this class will not be broken apart, only combined with other batches to 
reach the minimum batch size.
+ * </p>
+ * Note that {@link #close()} must be called after the last batch has been 
accepted in order to flush any
+ * buffered records.
+ */
+public class BufferingBatchConsumer<T> implements Consumer<List<T>> {
+
+    private final Consumer<List<T>> delegateConsumer;
+    private final List<T> bufferedBatch;
+    private final int minBatchSize;
+
+    BufferingBatchConsumer(Consumer<List<T>> delegateConsumer, int 
minBatchSize) {
+        this.delegateConsumer = delegateConsumer;
+        this.bufferedBatch = new ArrayList<>(minBatchSize);
+        this.minBatchSize = minBatchSize;
+    }
+
+    @Override
+    public void accept(List<T> batch) {
+        bufferedBatch.addAll(batch);
+        if (bufferedBatch.size() >= minBatchSize) {
+            delegateConsumer.accept(new ArrayList<>(bufferedBatch));
+            bufferedBatch.clear();
+        }
+    }
+
+    public void close() {

Review Comment:
   Close doesn't seem to be correct semantic since:
   1. This type doesn't use any resource outside of main memory
   2. The user can continue to use the object after calling close.
   
   I think a more accurate name generally used  in Java is `flush`, `force` or 
`drain`.



##########
metadata/src/main/java/org/apache/kafka/metadata/migration/KRaftMigrationDriver.java:
##########
@@ -645,6 +648,29 @@ public void run() throws Exception {
         }
     }
 
+    private BufferingBatchConsumer<ApiMessageAndVersion> 
buildMigrationBatchConsumer(
+        MigrationManifest.Builder manifestBuilder
+    ) {
+        return new BufferingBatchConsumer<>(batch -> {
+            try {
+                if (log.isTraceEnabled()) {
+                    batch.forEach(apiMessageAndVersion ->
+                        
log.trace(recordRedactor.toLoggableString(apiMessageAndVersion.message())));
+                }
+                CompletableFuture<?> future = 
zkRecordConsumer.acceptBatch(batch);

Review Comment:
   Related question, what happens if KRaft loses leadership in the middle of 
this consumer loop?



##########
metadata/src/main/java/org/apache/kafka/metadata/migration/KRaftMigrationDriver.java:
##########
@@ -664,23 +690,12 @@ public void run() throws Exception {
                 super.handleException(t);
             }
             try {
-                zkMigrationClient.readAllMetadata(batch -> {
-                    try {
-                        log.info("Migrating {} records from ZK", batch.size());
-                        if (log.isTraceEnabled()) {
-                            batch.forEach(apiMessageAndVersion ->
-                                
log.trace(recordRedactor.toLoggableString(apiMessageAndVersion.message())));
-                        }
-                        CompletableFuture<?> future = 
zkRecordConsumer.acceptBatch(batch);
-                        
FutureUtils.waitWithLogging(KRaftMigrationDriver.this.log, "",
-                            "the metadata layer to commit migration record 
batch",
-                            future, Deadline.fromDelay(time, 
METADATA_COMMIT_MAX_WAIT_MS, TimeUnit.MILLISECONDS), time);
-                        manifestBuilder.acceptBatch(batch);
-                    } catch (Throwable e) {
-                        // This will cause readAllMetadata to throw since this 
batch consumer is called directly from readAllMetadata
-                        throw new RuntimeException(e);
-                    }
-                }, brokersInMetadata::add);
+                BufferingBatchConsumer<ApiMessageAndVersion> 
migrationBatchConsumer = buildMigrationBatchConsumer(manifestBuilder);
+                zkMigrationClient.readAllMetadata(
+                    migrationBatchConsumer,
+                    brokersInMetadata::add
+                );
+                migrationBatchConsumer.close();

Review Comment:
   If `zkMigrationClient.readAllMetadata` throws `migrationBatchConsumer.close` 
is not called. Is this okay because `zkRecordConsumer.abortMigration` is called 
in the `catch`?



##########
metadata/src/main/java/org/apache/kafka/metadata/migration/KRaftMigrationDriver.java:
##########
@@ -645,6 +648,29 @@ public void run() throws Exception {
         }
     }
 
+    private BufferingBatchConsumer<ApiMessageAndVersion> 
buildMigrationBatchConsumer(
+        MigrationManifest.Builder manifestBuilder
+    ) {
+        return new BufferingBatchConsumer<>(batch -> {
+            try {
+                if (log.isTraceEnabled()) {
+                    batch.forEach(apiMessageAndVersion ->
+                        
log.trace(recordRedactor.toLoggableString(apiMessageAndVersion.message())));
+                }
+                CompletableFuture<?> future = 
zkRecordConsumer.acceptBatch(batch);
+                long batchStart = time.nanoseconds();
+                FutureUtils.waitWithLogging(KRaftMigrationDriver.this.log, "",

Review Comment:
   In general, Kafka should avoid blocking on a CompletableFuture. This can be 
avoided by using `CompletableFuture::thenCompose` or better yet 
`concurrent.Flow` since the `CompletableFuture` doesn't return an interesting 
value.
   
   I looked at `ZkMigrationClient`. If you wanted to use `Flow`. You would 
replace the use of `Consumer` with `Flow.Subscriber`. `ZkMigrationClient` would 
be come a `Flow.Publisher`.
   
   Flow has support for pipelining and back-pressure. For example, you would 
make initial `Subscription.request` `1000` and request more data as the 
`zkRecordConsumer` processes more batches.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to