damccorm commented on code in PR #38701:
URL: https://github.com/apache/beam/pull/38701#discussion_r3311091617


##########
sdks/python/test-suites/dataflow/common.gradle:
##########
@@ -478,6 +478,26 @@ def vllmTests = tasks.create("vllmTests") {
     executable 'sh'
     args '-c', ". ${envdir}/bin/activate && pip install openai && python -m 
apache_beam.examples.inference.vllm_text_completion $cmdArgs --chat true 
--chat_template 
'gs://apache-beam-ml/additional_files/sample_chat_template.jinja' 
--experiment='worker_accelerator=type:nvidia-tesla-t4;count:1;install-nvidia-driver:5xx'"
   }
+  // TODO(https://github.com/apache/beam/pull/36966): Add Dataflow IT
+  // coverage for the embedded Dynamo path. The example pipeline already
+  // supports --use_dynamo (see VLLMCompletionsModelHandler). Enabling this
+  // requires updating vllm.dockerfile.old to install etcd and
+  // ai-dynamo[vllm], and provisioning a GPU pool with enough memory for
+  // the model plus the Dynamo runtime (e.g. NVIDIA L4 on g2-standard-4).
+  // The embedded mode was validated end-to-end on a T4 VM with
+  // Qwen/Qwen3-0.6B via DirectRunner; this change is scoped to the

Review Comment:
   Given that this was validated locally with a T4, why do we need an L4?
   
   We should try running this on Dataflow which has built in accelerator 
support; that way we don't need to provision our own pool



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to