[
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268601#comment-16268601
]
Etienne Chauchot commented on BEAM-3201:
----------------------------------------
Hi [~nerdynick]. Ok for the partition transform, it does not fit your use case.
Of course deserialization of the json string will be done inside the
{{writeFn.ProcessElement}} only once and the deserialized object will be passed
to the three {{with[id|type|index]Fn}}. The deserialized object cannot be
jackson JSONObject because it is not serializable preventing the 3
{{with[id|type|index]Fn}} user defined functions to be called by beam. We can
chose whatever object representation of json as long as it is serializable.
The {{with[id|type|index]Fn}} functions will take this object representation
as parameter and output {{String}} value (String id value, String index value,
String type value) determined by the user out of the object representation of
the ES document. Beam will not add or remove metadata _id, _type, _index to the
message payload in Read and Write (to avoid deserialize/parse/re-serialize).
But if the user wants to add these fields to his documents to get them
afterwards in {{with[id|type|index]Fn}} or just determine their value out of
other fields it is ok but these fields would be stored as part of the paylaod
(leaving the document untouched).
> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-extensions
> Reporter: Etienne Chauchot
> Assignee: Chet Aldrich
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch
> generates a document id for each record inserted. So each new insertion is
> considered as a new document. Users want to be able to update documents using
> the IO. So, for the write part of the IO, users should be able to provide a
> document id so that they could update already stored documents. Providing an
> id for the documents could also help the user on indempotency.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)