jrmccluskey commented on code in PR #35677: URL: https://github.com/apache/beam/pull/35677#discussion_r2285680282
########## sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai.py: ########## @@ -281,3 +294,194 @@ def get_ptransform_for_processing(self, **kwargs) -> beam.PTransform: return RunInference( model_handler=_ImageEmbeddingHandler(self), inference_args=self.inference_args) + + +@dataclass +class VertexAIMultiModalInput: + image: Optional[Image] = None + video: Optional[Video] = None + contextual_text: Optional[str] = None Review Comment: So for something like the mimeType being specifiable that would likely be something composed in the Image object (if you look at get_embeddings() [from Vertex](https://cloud.google.com/vertex-ai/generative-ai/docs/reference/python/1.71.0/vertexai.vision_models.MultiModalEmbeddingModel#vertexai_vision_models_MultiModalEmbeddingModel_get_embeddings) those objects are how we construct the request) and not something we'd have to explicitly support. If we really wanted to support that now we'd be rolling our own request code, which I don't think is in our best interest. I think my main hangup on adding a bunch of wrappers around each field is largely around the ease of use around the transform. The whole point of adding the input adapters was so users wouldn't have to compose the input dataclass themselves, just pass the fields to us and we do the rest. I do worry about overcomplicating this in anticipation of _maybe_ having additional configuration options added to vertex later. Admittedly I am not as familiar with other frameworks for multimodal embeddings beyond Vertex, so maybe this is a reasonable expectation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@beam.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org