Re: [PR] Add DiskProvisionedIops and DiskProvisionedThroughputMibps pipeline options [beam]

via GitHub Tue, 14 Apr 2026 05:33:54 -0700


gemini-code-assist[bot] commented on code in PR #37377:
URL: https://github.com/apache/beam/pull/37377#discussion_r3079455108



##########
sdks/go/pkg/beam/runners/dataflow/dataflow.go:
##########
@@ -61,9 +61,11 @@ var (
        numWorkers             = flag.Int64("num_workers", 0, "Number of 
workers (optional).")
        workerHarnessThreads   = flag.Int64("number_of_worker_harness_threads", 
0, "The number of threads per each worker harness process (optional).")
        maxNumWorkers          = flag.Int64("max_num_workers", 0, "Maximum 
number of workers during scaling (optional).")
-       diskSizeGb             = flag.Int64("disk_size_gb", 0, "Size of root 
disk for VMs, in GB (optional).")
-       diskType               = flag.String("disk_type", "", "Type of root 
disk for VMs (optional).")
-       autoscalingAlgorithm   = flag.String("autoscaling_algorithm", "", 
"Autoscaling mode to use (optional).")
+       diskSizeGb                     = flag.Int64("disk_size_gb", 0, "Size of 
root disk for VMs, in GB (optional).")
+       diskType                       = flag.String("disk_type", "", "Type of 
root disk for VMs (optional).")
+       diskProvisionedIops            = flag.Int64("disk_provisioned_iops", 0, 
"Provisioned IOPS for the worker disk (optional).")
+       diskProvisionedThroughputMibps = 
flag.Int64("disk_provisioned_throughput_mibps", 0, "Provisioned throughput in 
MiB/s for the worker disk (optional).")

Review Comment:
   ![high](https://www.gstatic.com/codereviewagent/high-priority.svg)
   
   The Go implementation appears to be incomplete. While flags and `JobOptions` 
fields have been added, there is no corresponding logic to translate these 
values into the `WorkerPool` configuration within the Dataflow API request 
(typically handled in `environment.go`). Without this translation, these flags 
will have no effect on the provisioned workers.



##########
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java:
##########
@@ -193,6 +193,20 @@ public String getAlgorithm() {
 
   void setWorkerDiskType(String value);
 
+  /** Provisioned IOPS for the worker disk. */
+  @Description("Provisioned IOPS for the worker disk.")
+  @Nullable
+  Integer getDiskProvisionedIops();
+
+  void setDiskProvisionedIops(Integer value);

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Consider using `Long` instead of `Integer` for these performance-related 
fields. While `Integer` is likely sufficient for current IOPS limits, using 
`Long` is more consistent with the underlying Dataflow API (which the 
translator casts to `Long`) and the Go implementation in this PR. It also 
future-proofs the options against very high performance disk specifications.
   
   ```suggestion
     Long getDiskProvisionedIops();
   
     void setDiskProvisionedIops(Long value);
   ```



##########
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/DataflowPipelineTranslator.java:
##########
@@ -489,6 +489,13 @@ public Job translate(List<DataflowPackage> packages) {
       if (options.getDiskSizeGb() > 0) {
         workerPool.setDiskSizeGb(options.getDiskSizeGb());
       }
+      if (options.getDiskProvisionedIops() != null) {
+        
workerPool.setDiskProvisionedIops(Long.valueOf(options.getDiskProvisionedIops()));
+      }
+      if (options.getDiskProvisionedThroughputMibps() != null) {
+        workerPool.setDiskProvisionedThroughputMibps(
+            Long.valueOf(options.getDiskProvisionedThroughputMibps()));
+      }

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   For consistency with how `diskSizeGb` is handled (lines 489-491), it is 
recommended to also check that these values are greater than zero. This 
provides basic client-side validation and ensures that a value of 0 (if 
provided) doesn't inadvertently override service defaults in an unexpected way. 
Additionally, the conversion to `Long` can be simplified.
   
   ```suggestion
         if (options.getDiskProvisionedIops() != null && 
options.getDiskProvisionedIops() > 0) {
           workerPool.setDiskProvisionedIops((long) 
options.getDiskProvisionedIops());
         }
         if (options.getDiskProvisionedThroughputMibps() != null && 
options.getDiskProvisionedThroughputMibps() > 0) {
           workerPool.setDiskProvisionedThroughputMibps(
               (long) options.getDiskProvisionedThroughputMibps());
         }
   ```



##########
runners/google-cloud-dataflow-java/src/main/java/org/apache/beam/runners/dataflow/options/DataflowPipelineWorkerPoolOptions.java:
##########
@@ -193,6 +193,20 @@ public String getAlgorithm() {
 
   void setWorkerDiskType(String value);
 
+  /** Provisioned IOPS for the worker disk. */
+  @Description("Provisioned IOPS for the worker disk.")
+  @Nullable
+  Integer getDiskProvisionedIops();
+
+  void setDiskProvisionedIops(Integer value);
+
+  /** Provisioned throughput in MiB/s for the worker disk. */
+  @Description("Provisioned throughput in MiB/s for the worker disk.")
+  @Nullable
+  Integer getDiskProvisionedThroughputMibps();
+
+  void setDiskProvisionedThroughputMibps(Integer value);

Review Comment:
   ![medium](https://www.gstatic.com/codereviewagent/medium-priority.svg)
   
   Consider using `Long` for throughput as well, for consistency and 
future-proofing.
   
   ```suggestion
     Long getDiskProvisionedThroughputMibps();
   
     void setDiskProvisionedThroughputMibps(Long value);
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Add DiskProvisionedIops and DiskProvisionedThroughputMibps pipeline options [beam]

Reply via email to