[ https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16256316#comment-16256316 ]
Chet Aldrich commented on BEAM-3201: ------------------------------------ Hey, so I'd be happy to take this ticket on, and the design seems reasonable. I have one question about the design above: The API for writing currently is of the form PCollection<String> as defined [here|https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L731], and not PCollection<JSONObject>. I suppose we can convert the String that is passed in to a JSONObject or some similar construct and then try to find the field specified in `withDocumentIdField`. I'm assuming that we _don't_ want to change the input type to PCollection<JSONObject>, right? We would instead just throw an exception if a String that is passed in is not valid JSON. > ElasticsearchIO should deal with documents id > --------------------------------------------- > > Key: BEAM-3201 > URL: https://issues.apache.org/jira/browse/BEAM-3201 > Project: Beam > Issue Type: Improvement > Components: sdk-java-extensions > Reporter: Etienne Chauchot > Assignee: Etienne Chauchot > > Today the ESIO only inserts the payload of the ES documents. Elasticsearch > generates a document id for each record inserted. So each new insertion is > considered as a new document. Users want to be able to update documents using > the IO. So, for the write part of the IO, users should be able to provide a > document id so that they could update already stored documents. Providing an > id for the documents could also help the user on indempotency. -- This message was sent by Atlassian JIRA (v6.4.14#64029)