Sergei Lilichenko created BEAM-10692:
----------------------------------------
Summary: Change org.apache.beam.sdk.extensions.ml.CloudVision to
associate the AnnotateImageResponses with the image data used for the annotation
Key: BEAM-10692
URL: https://issues.apache.org/jira/browse/BEAM-10692
Project: Beam
Issue Type: New Feature
Components: extensions-java-gcp
Affects Versions: 2.22.0
Reporter: Sergei Lilichenko
There is a problem with the design of that transform. It takes a
PCollection<String> (in case of GCS URIs) in and outputs
PCollection<List<AnnotateImageResponse>>. There is no way to associate the
responses with the original file URIs.
[ImageAnnotationContext|https://cloud.google.com/vision/docs/reference/rest/v1/AnnotateImageResponse#ImageAnnotationContext]
is returned as part of the response, but the "uri" is empty for the majority
of annotations (looks like it's only populated for file annotations and not for
image annotations).
One approach is to return KV<String, List<AnnotateImageResponse>> for images
where the key is the GCS URI and for bytes to pass an id of any type and do
KV<IDTYPE, List<AnnotateImageResponse>>.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)