damccorm commented on code in PR #35677:
URL: https://github.com/apache/beam/pull/35677#discussion_r2298486500
##########
sdks/python/apache_beam/ml/transforms/embeddings/vertex_ai.py:
##########
@@ -281,3 +294,194 @@ def get_ptransform_for_processing(self, **kwargs) ->
beam.PTransform:
return RunInference(
model_handler=_ImageEmbeddingHandler(self),
inference_args=self.inference_args)
+
+
+@dataclass
+class VertexAIMultiModalInput:
+ image: Optional[Image] = None
+ video: Optional[Video] = None
+ contextual_text: Optional[str] = None
Review Comment:
> It is not part of the image object today, but given how get_embeddings()
works there wouldn't be a way to specify that otherwise unless they really
expanded the function signature.
Yeah, I recognize that there is no way of specifying this today. The
advantage of using our own dataclass to capture information like this is that
we could support it in a forward-compatible way through a single config object
in the future regardless of how vertex's client decides to support it.
> There was some conversation with @claudevdm around expanding Chunks to be
capable of containing multimodal data and writing a separate input/output
adapter pair for them, but I suppose there's separate merit around just
accepting text Chunks here.
I would lean towards unifying this now rather than trying to have an extra
way of doing this wrapping in the future. That will also allow this to play
nicely with writing (text) to a vector db, though we'd still need to do similar
work for images/videos eventually
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]