damccorm commented on code in PR #35677: URL: https://github.com/apache/beam/pull/35677#discussion_r2292012846
########## sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai.py: ########## @@ -281,3 +294,194 @@ def get_ptransform_for_processing(self, **kwargs) -> beam.PTransform: return RunInference( model_handler=_ImageEmbeddingHandler(self), inference_args=self.inference_args) + + +@dataclass +class VertexAIMultiModalInput: + image: Optional[Image] = None + video: Optional[Video] = None + contextual_text: Optional[str] = None Review Comment: > So for something like the mimeType being specifiable that would likely be something composed in the Image object (if you look at get_embeddings() [from Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/1.71.0/vertexai.vision_models.MultiModalEmbeddingModel#vertexai_vision_models_MultiModalEmbeddingModel_get_embeddings) those objects are how we construct the request) and not something we'd have to explicitly support Is it part of the Image object today? I agree that is one way it could be supported, its not clear that it would definitely be the way Vertex would choose to support it though. I'm also thinking this would play nicely with Chunks for text (basically we could have wrappers for all 3 object types which seems fairly natural). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org