[jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id

Etienne Chauchot (JIRA) Tue, 28 Nov 2017 03:31:51 -0800

    [ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16268601#comment-16268601
 ]


Etienne Chauchot commented on BEAM-3201:
----------------------------------------

Hi [~nerdynick]. Ok for the partition transform, it does not fit your use case.
Of course deserialization of the json string will be done inside the 
{{writeFn.ProcessElement}} only once and the deserialized object will be passed 
to the three {{with[id|type|index]Fn}}. The deserialized object cannot be 
jackson JSONObject because it is not serializable preventing the 3 
{{with[id|type|index]Fn}} user defined functions to be called by beam. We can 
chose whatever object representation of json as long as it is serializable.  
The  {{with[id|type|index]Fn}} functions will take this object representation 
as parameter and output {{String}} value (String id value, String index value, 
String type value) determined by the user out of the object representation of 
the ES document. Beam will not add or remove metadata _id, _type, _index to the 
message payload in Read and Write (to avoid deserialize/parse/re-serialize). 
But if the user wants to add these fields to his documents to get them 
afterwards in {{with[id|type|index]Fn}} or just determine their value out of 
other fields it is ok but these fields would be stored as part of the paylaod 
(leaving the document untouched).





> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
>                 Key: BEAM-3201
>                 URL: https://issues.apache.org/jira/browse/BEAM-3201
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Chet Aldrich
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch 
> generates a document id for each record inserted. So each new insertion is 
> considered as a new document. Users want to be able to update documents using 
> the IO. So, for the write part of the IO, users should be able to provide a 
> document id so that they could update already stored documents. Providing an 
> id for the documents could also help the user on indempotency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id

Reply via email to