[ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256316#comment-16256316
 ] 

Chet Aldrich commented on BEAM-3201:
------------------------------------

Hey, so I'd be happy to take this ticket on, and the design seems reasonable.

I have one question about the design above: 

The API for writing currently is of the form PCollection<String> as defined 
[here|https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L731],
 and not PCollection<JSONObject>. I suppose we can convert the String that is 
passed in to a JSONObject or some similar construct and then try to find the 
field specified in `withDocumentIdField`. I'm assuming that we _don't_ want to 
change the input type to PCollection<JSONObject>, right? We would instead just 
throw an exception if a String that is passed in is not valid JSON.






> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
>                 Key: BEAM-3201
>                 URL: https://issues.apache.org/jira/browse/BEAM-3201
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Etienne Chauchot
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch 
> generates a document id for each record inserted. So each new insertion is 
> considered as a new document. Users want to be able to update documents using 
> the IO. So, for the write part of the IO, users should be able to provide a 
> document id so that they could update already stored documents. Providing an 
> id for the documents could also help the user on indempotency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to