gemini-code-assist[bot] commented on code in PR #38534: URL: https://github.com/apache/beam/pull/38534#discussion_r3263998528
########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java: ########## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.io.IOException; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.metrics.MetricResults; +import org.joda.time.Duration; + +/** + * Forwards {@link PipelineResult} calls to a delegate and stops an embedded job server when the + * pipeline reaches a terminal state. + */ +class KafkaStreamsPipelineResult implements PipelineResult { + + private final PipelineResult delegate; + private final Runnable stopJobServer; + + KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) { + this.delegate = delegate; + this.stopJobServer = stopJobServer; + } + + @Override + public State getState() { + return delegate.getState(); + } + + @Override + public State cancel() throws IOException { + State state = delegate.cancel(); + stopJobServer.run(); + return state; + } + + @Override + public State waitUntilFinish(Duration duration) { + State state = delegate.waitUntilFinish(duration); + stopJobServer.run(); + return state; + } + + @Override + public State waitUntilFinish() { + State state = delegate.waitUntilFinish(); + stopJobServer.run(); + return state; + } Review Comment:  The `stopJobServer.run()` call should be placed in a `finally` block to ensure the embedded job server is stopped even if `delegate.waitUntilFinish()` throws an exception. ```suggestion @Override public State waitUntilFinish() { try { return delegate.waitUntilFinish(); } finally { stopJobServer.run(); } } ``` ########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java: ########## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.io.IOException; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.metrics.MetricResults; +import org.joda.time.Duration; + +/** + * Forwards {@link PipelineResult} calls to a delegate and stops an embedded job server when the + * pipeline reaches a terminal state. + */ +class KafkaStreamsPipelineResult implements PipelineResult { + + private final PipelineResult delegate; + private final Runnable stopJobServer; + + KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) { + this.delegate = delegate; + this.stopJobServer = stopJobServer; + } + + @Override + public State getState() { + return delegate.getState(); + } + + @Override + public State cancel() throws IOException { + State state = delegate.cancel(); + stopJobServer.run(); + return state; + } + + @Override + public State waitUntilFinish(Duration duration) { + State state = delegate.waitUntilFinish(duration); + stopJobServer.run(); + return state; + } Review Comment:  The `stopJobServer.run()` call should be placed in a `finally` block to ensure the embedded job server is stopped even if `delegate.waitUntilFinish(duration)` throws an exception. This prevents resource leaks if the pipeline monitoring fails. ```suggestion @Override public State waitUntilFinish(Duration duration) { try { return delegate.waitUntilFinish(duration); } finally { stopJobServer.run(); } } ``` ########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineOptions.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.nio.file.Paths; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PortablePipelineOptions; + +/** Pipeline options for the Kafka Streams runner. */ +public interface KafkaStreamsPipelineOptions extends PortablePipelineOptions { + + @Description("Comma-separated list of host:port Kafka brokers used by the Kafka Streams client.") + @Default.String("localhost:9092") + String getBootstrapServers(); + + void setBootstrapServers(String bootstrapServers); + + @Description( + "Kafka Streams application.id (must be unique for each distinct topology using the same " + + "input topics in a Kafka cluster).") + @Default.String("beam-kafka-streams-runner") Review Comment:  The default `applicationId` is a constant string. If multiple Beam pipelines are run against the same Kafka cluster using default options, they will collide on the same Kafka Streams application ID, leading to incorrect behavior or startup failures. Consider using a more unique default (e.g., incorporating a UUID) or documenting that this must be overridden for concurrent jobs. ########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsRunner.java: ########## @@ -0,0 +1,101 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; +import org.apache.beam.runners.portability.PortableRunner; +import org.apache.beam.sdk.Pipeline; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.PipelineRunner; +import org.apache.beam.sdk.options.ExperimentalOptions; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.util.construction.Environments; +import org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.base.Strings; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +/** + * A {@link PipelineRunner} that submits portable jobs to an in-process or external Beam job service + * backed by the Kafka Streams translation path. + */ +@SuppressWarnings({ + "nullness" // TODO(https://github.com/apache/beam/issues/20497) +}) +public class KafkaStreamsRunner extends PipelineRunner<PipelineResult> { + + private static final Logger LOG = LoggerFactory.getLogger(KafkaStreamsRunner.class); + + private final KafkaStreamsPipelineOptions pipelineOptions; + + public static KafkaStreamsRunner fromOptions(PipelineOptions options) { + return new KafkaStreamsRunner(options.as(KafkaStreamsPipelineOptions.class)); + } + + protected KafkaStreamsRunner(KafkaStreamsPipelineOptions pipelineOptions) { + this.pipelineOptions = pipelineOptions; + } + + @Override + public PipelineResult run(Pipeline pipeline) { + assignPortableDefaults(pipelineOptions); + KafkaStreamsJobServerDriver jobServerDriver = null; + try { + if (Strings.isNullOrEmpty(pipelineOptions.getJobEndpoint())) { + LOG.info("No job endpoint configured; starting an embedded Kafka Streams job server."); + KafkaStreamsJobServerDriver.KafkaStreamsServerConfiguration configuration = + new KafkaStreamsJobServerDriver.KafkaStreamsServerConfiguration(); + configuration.setPort(0); + jobServerDriver = KafkaStreamsJobServerDriver.fromConfig(configuration); + pipelineOptions.setJobEndpoint(jobServerDriver.start()); + } + PortableRunner portableRunner = PortableRunner.fromOptions(pipelineOptions); + PipelineResult result = portableRunner.run(pipeline); + if (jobServerDriver != null) { + return new KafkaStreamsPipelineResult(result, jobServerDriver::stop); + } + return result; + } catch (IOException e) { + if (jobServerDriver != null) { + jobServerDriver.stop(); + } + throw new RuntimeException(e); + } Review Comment:  The `catch` block only handles `IOException`. If a `RuntimeException` occurs during `PortableRunner` initialization or job submission (e.g., in `portableRunner.run(pipeline)`), the `jobServerDriver` will not be stopped, leading to a resource leak of the embedded server. Consider catching `Exception` to cover all failure scenarios during submission. ```suggestion } catch (Exception e) { if (jobServerDriver != null) { jobServerDriver.stop(); } if (e instanceof RuntimeException) { throw (RuntimeException) e; } throw new RuntimeException(e); } ``` ########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineOptions.java: ########## @@ -0,0 +1,76 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.nio.file.Paths; +import org.apache.beam.sdk.options.Default; +import org.apache.beam.sdk.options.DefaultValueFactory; +import org.apache.beam.sdk.options.Description; +import org.apache.beam.sdk.options.PipelineOptions; +import org.apache.beam.sdk.options.PortablePipelineOptions; + +/** Pipeline options for the Kafka Streams runner. */ +public interface KafkaStreamsPipelineOptions extends PortablePipelineOptions { + + @Description("Comma-separated list of host:port Kafka brokers used by the Kafka Streams client.") + @Default.String("localhost:9092") + String getBootstrapServers(); + + void setBootstrapServers(String bootstrapServers); + + @Description( + "Kafka Streams application.id (must be unique for each distinct topology using the same " + + "input topics in a Kafka cluster).") + @Default.String("beam-kafka-streams-runner") + String getApplicationId(); + + void setApplicationId(String applicationId); + + @Description( + "Kafka Streams processing.guarantee setting, for example at_least_once or exactly_once_v2.") + @Default.String("exactly_once_v2") + String getProcessingGuarantee(); + + void setProcessingGuarantee(String processingGuarantee); + + @Description("Soft cap on the number of elements per bundle.") + @Default.Integer(1000) + int getMaxBundleSize(); + + void setMaxBundleSize(int maxBundleSize); + + @Description("Soft cap on bundle wall-clock duration in milliseconds.") + @Default.Integer(1000) + int getMaxBundleTimeMs(); + + void setMaxBundleTimeMs(int maxBundleTimeMs); + + @Description("Directory where Kafka Streams stores local state.") + @Default.InstanceFactory(StateDirDefaultFactory.class) + String getStateDir(); + + void setStateDir(String stateDir); + + /** Default {@link #getStateDir()} under the JVM temp directory. */ + class StateDirDefaultFactory implements DefaultValueFactory<String> { + @Override + public String create(PipelineOptions options) { + return Paths.get(System.getProperty("java.io.tmpdir"), "beam-kafka-streams-state").toString(); Review Comment:  The default `stateDir` uses a fixed path in the temporary directory. Multiple concurrent pipeline executions on the same host will conflict on this directory, which can cause lock contention or state corruption in Kafka Streams. Consider using a unique subdirectory per execution. ########## runners/kafka-streams/src/main/java/org/apache/beam/runners/kafka/streams/KafkaStreamsPipelineResult.java: ########## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.beam.runners.kafka.streams; + +import java.io.IOException; +import org.apache.beam.sdk.PipelineResult; +import org.apache.beam.sdk.metrics.MetricResults; +import org.joda.time.Duration; + +/** + * Forwards {@link PipelineResult} calls to a delegate and stops an embedded job server when the + * pipeline reaches a terminal state. + */ +class KafkaStreamsPipelineResult implements PipelineResult { + + private final PipelineResult delegate; + private final Runnable stopJobServer; + + KafkaStreamsPipelineResult(PipelineResult delegate, Runnable stopJobServer) { + this.delegate = delegate; + this.stopJobServer = stopJobServer; + } + + @Override + public State getState() { + return delegate.getState(); + } + + @Override + public State cancel() throws IOException { + State state = delegate.cancel(); + stopJobServer.run(); + return state; + } Review Comment:  The `stopJobServer.run()` call should be placed in a `finally` block to ensure the embedded job server is stopped even if `delegate.cancel()` throws an `IOException`. This prevents resource leaks in the event of a failure during cancellation. ```suggestion @Override public State cancel() throws IOException { try { return delegate.cancel(); } finally { stopJobServer.run(); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
