gemini-code-assist[bot] commented on code in PR #38534:
URL: https://github.com/apache/beam/pull/38534#discussion_r3263998528


##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.io.IOException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.MetricResults;
+import org.joda.time.Duration;
+
+/**
+ * Forwards {@link PipelineResult} calls to a delegate and stops an embedded 
job server when the
+ * pipeline reaches a terminal state.
+ */
+class KafkaStreamsPipelineResult implements PipelineResult {
+
+  private final PipelineResult delegate;
+  private final Runnable stopJobServer;
+
+  KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) {
+    this.delegate = delegate;
+    this.stopJobServer = stopJobServer;
+  }
+
+  @Override
+  public State getState() {
+    return delegate.getState();
+  }
+
+  @Override
+  public State cancel() throws IOException {
+    State state = delegate.cancel();
+    stopJobServer.run();
+    return state;
+  }
+
+  @Override
+  public State waitUntilFinish(Duration duration) {
+    State state = delegate.waitUntilFinish(duration);
+    stopJobServer.run();
+    return state;
+  }
+
+  @Override
+  public State waitUntilFinish() {
+    State state = delegate.waitUntilFinish();
+    stopJobServer.run();
+    return state;
+  }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `stopJobServer.run()` call should be placed in a `finally` block to 
ensure the embedded job server is stopped even if `delegate.waitUntilFinish()` 
throws an exception.
   
   ```suggestion
     @Override
     public State waitUntilFinish() {
       try {
         return delegate.waitUntilFinish();
       } finally {
         stopJobServer.run();
       }
     }
   ```



##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.io.IOException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.MetricResults;
+import org.joda.time.Duration;
+
+/**
+ * Forwards {@link PipelineResult} calls to a delegate and stops an embedded 
job server when the
+ * pipeline reaches a terminal state.
+ */
+class KafkaStreamsPipelineResult implements PipelineResult {
+
+  private final PipelineResult delegate;
+  private final Runnable stopJobServer;
+
+  KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) {
+    this.delegate = delegate;
+    this.stopJobServer = stopJobServer;
+  }
+
+  @Override
+  public State getState() {
+    return delegate.getState();
+  }
+
+  @Override
+  public State cancel() throws IOException {
+    State state = delegate.cancel();
+    stopJobServer.run();
+    return state;
+  }
+
+  @Override
+  public State waitUntilFinish(Duration duration) {
+    State state = delegate.waitUntilFinish(duration);
+    stopJobServer.run();
+    return state;
+  }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `stopJobServer.run()` call should be placed in a `finally` block to 
ensure the embedded job server is stopped even if 
`delegate.waitUntilFinish(duration)` throws an exception. This prevents 
resource leaks if the pipeline monitoring fails.
   
   ```suggestion
     @Override
     public State waitUntilFinish(Duration duration) {
       try {
         return delegate.waitUntilFinish(duration);
       } finally {
         stopJobServer.run();
       }
     }
   ```



##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineOptions.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.nio.file.Paths;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PortablePipelineOptions;
+
+/** Pipeline options for the Kafka Streams runner. */
+public interface KafkaStreamsPipelineOptions extends PortablePipelineOptions {
+
+  @Description("Comma-separated list of host:port Kafka brokers used by the 
Kafka Streams client.")
+  @Default.String("localhost:9092")
+  String getBootstrapServers();
+
+  void setBootstrapServers(String bootstrapServers);
+
+  @Description(
+      "Kafka Streams application.id (must be unique for each distinct topology 
using the same "
+          + "input topics in a Kafka cluster).")
+  @Default.String("beam-kafka-streams-runner")

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The default `applicationId` is a constant string. If multiple Beam pipelines 
are run against the same Kafka cluster using default options, they will collide 
on the same Kafka Streams application ID, leading to incorrect behavior or 
startup failures. Consider using a more unique default (e.g., incorporating a 
UUID) or documenting that this must be overridden for concurrent jobs.



##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsRunner.java:
##########
@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+import org.apache.beam.runners.portability.PortableRunner;
+import org.apache.beam.sdk.Pipeline;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.PipelineRunner;
+import org.apache.beam.sdk.options.ExperimentalOptions;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.util.construction.Environments;
+import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Strings;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * A {@link PipelineRunner} that submits portable jobs to an in-process or 
external Beam job service
+ * backed by the Kafka Streams translation path.
+ */
+@SuppressWarnings({
+  "nullness" // TODO(https://github.com/apache/beam/issues/20497)
+})
+public class KafkaStreamsRunner extends PipelineRunner<PipelineResult> {
+
+  private static final Logger LOG = 
LoggerFactory.getLogger(KafkaStreamsRunner.class);
+
+  private final KafkaStreamsPipelineOptions pipelineOptions;
+
+  public static KafkaStreamsRunner fromOptions(PipelineOptions options) {
+    return new 
KafkaStreamsRunner(options.as(KafkaStreamsPipelineOptions.class));
+  }
+
+  protected KafkaStreamsRunner(KafkaStreamsPipelineOptions pipelineOptions) {
+    this.pipelineOptions = pipelineOptions;
+  }
+
+  @Override
+  public PipelineResult run(Pipeline pipeline) {
+    assignPortableDefaults(pipelineOptions);
+    KafkaStreamsJobServerDriver jobServerDriver = null;
+    try {
+      if (Strings.isNullOrEmpty(pipelineOptions.getJobEndpoint())) {
+        LOG.info("No job endpoint configured; starting an embedded Kafka 
Streams job server.");
+        KafkaStreamsJobServerDriver.KafkaStreamsServerConfiguration 
configuration =
+            new KafkaStreamsJobServerDriver.KafkaStreamsServerConfiguration();
+        configuration.setPort(0);
+        jobServerDriver = 
KafkaStreamsJobServerDriver.fromConfig(configuration);
+        pipelineOptions.setJobEndpoint(jobServerDriver.start());
+      }
+      PortableRunner portableRunner = 
PortableRunner.fromOptions(pipelineOptions);
+      PipelineResult result = portableRunner.run(pipeline);
+      if (jobServerDriver != null) {
+        return new KafkaStreamsPipelineResult(result, jobServerDriver::stop);
+      }
+      return result;
+    } catch (IOException e) {
+      if (jobServerDriver != null) {
+        jobServerDriver.stop();
+      }
+      throw new RuntimeException(e);
+    }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `catch` block only handles `IOException`. If a `RuntimeException` occurs 
during `PortableRunner` initialization or job submission (e.g., in 
`portableRunner.run(pipeline)`), the `jobServerDriver` will not be stopped, 
leading to a resource leak of the embedded server. Consider catching 
`Exception` to cover all failure scenarios during submission.
   
   ```suggestion
       } catch (Exception e) {
         if (jobServerDriver != null) {
           jobServerDriver.stop();
         }
         if (e instanceof RuntimeException) {
           throw (RuntimeException) e;
         }
         throw new RuntimeException(e);
       }
   ```



##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineOptions.java:
##########
@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.nio.file.Paths;
+import org.apache.beam.sdk.options.Default;
+import org.apache.beam.sdk.options.DefaultValueFactory;
+import org.apache.beam.sdk.options.Description;
+import org.apache.beam.sdk.options.PipelineOptions;
+import org.apache.beam.sdk.options.PortablePipelineOptions;
+
+/** Pipeline options for the Kafka Streams runner. */
+public interface KafkaStreamsPipelineOptions extends PortablePipelineOptions {
+
+  @Description("Comma-separated list of host:port Kafka brokers used by the 
Kafka Streams client.")
+  @Default.String("localhost:9092")
+  String getBootstrapServers();
+
+  void setBootstrapServers(String bootstrapServers);
+
+  @Description(
+      "Kafka Streams application.id (must be unique for each distinct topology 
using the same "
+          + "input topics in a Kafka cluster).")
+  @Default.String("beam-kafka-streams-runner")
+  String getApplicationId();
+
+  void setApplicationId(String applicationId);
+
+  @Description(
+      "Kafka Streams processing.guarantee setting, for example at_least_once 
or exactly_once_v2.")
+  @Default.String("exactly_once_v2")
+  String getProcessingGuarantee();
+
+  void setProcessingGuarantee(String processingGuarantee);
+
+  @Description("Soft cap on the number of elements per bundle.")
+  @Default.Integer(1000)
+  int getMaxBundleSize();
+
+  void setMaxBundleSize(int maxBundleSize);
+
+  @Description("Soft cap on bundle wall-clock duration in milliseconds.")
+  @Default.Integer(1000)
+  int getMaxBundleTimeMs();
+
+  void setMaxBundleTimeMs(int maxBundleTimeMs);
+
+  @Description("Directory where Kafka Streams stores local state.")
+  @Default.InstanceFactory(StateDirDefaultFactory.class)
+  String getStateDir();
+
+  void setStateDir(String stateDir);
+
+  /** Default {@link #getStateDir()} under the JVM temp directory. */
+  class StateDirDefaultFactory implements DefaultValueFactory<String> {
+    @Override
+    public String create(PipelineOptions options) {
+      return Paths.get(System.getProperty("java.io.tmpdir"), 
"beam-kafka-streams-state").toString();

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   The default `stateDir` uses a fixed path in the temporary directory. 
Multiple concurrent pipeline executions on the same host will conflict on this 
directory, which can cause lock contention or state corruption in Kafka 
Streams. Consider using a unique subdirectory per execution.



##########
runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java:
##########
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.beam.runners.kafka.streams;
+
+import java.io.IOException;
+import org.apache.beam.sdk.PipelineResult;
+import org.apache.beam.sdk.metrics.MetricResults;
+import org.joda.time.Duration;
+
+/**
+ * Forwards {@link PipelineResult} calls to a delegate and stops an embedded 
job server when the
+ * pipeline reaches a terminal state.
+ */
+class KafkaStreamsPipelineResult implements PipelineResult {
+
+  private final PipelineResult delegate;
+  private final Runnable stopJobServer;
+
+  KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) {
+    this.delegate = delegate;
+    this.stopJobServer = stopJobServer;
+  }
+
+  @Override
+  public State getState() {
+    return delegate.getState();
+  }
+
+  @Override
+  public State cancel() throws IOException {
+    State state = delegate.cancel();
+    stopJobServer.run();
+    return state;
+  }

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The `stopJobServer.run()` call should be placed in a `finally` block to 
ensure the embedded job server is stopped even if `delegate.cancel()` throws an 
`IOException`. This prevents resource leaks in the event of a failure during 
cancellation.
   
   ```suggestion
     @Override
     public State cancel() throws IOException {
       try {
         return delegate.cancel();
       } finally {
         stopJobServer.run();
       }
     }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to