[ https://issues.apache.org/jira/browse/BEAM-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16487085#comment-16487085 ]
Tim Robertson edited comment on BEAM-4389 at 5/23/18 11:09 AM: --------------------------------------------------------------- Thanks for the quick reply [~echauchot] The {{withUsePartialUpdate(true)}} would simply change the {{bulk}} list sent to ES to have {{update}} instead of {{index}} operations. Server side Elasticsearch treats this as a "get document, apply edits, save document" operation. In our code I think it would be something as simple as exposing the configuration toggle and changing: {code} batch.add(String.format("{ \"index\" : %s }%n%s%n", documentAddress, document)); {code} to {code} String operation = spec.isPartialUpdate() ? "update" : "index"; batch.add(String.format("{ \"%s\" : %s }%n%s%n", operation, documentAddress, document)); {code} New fields being introduced and schema compatibility seem no different to the current model (you can push nonsense JSON to a live Elasticsearch using today). Or am I overlooking something please? was (Author: timrobertson100): Thanks for the quick reply [~echauchot] The {withUsePartialUpdate(true)} would simply change the {bulk} list sent to ES to have {update} instead of {index} operations. Server side Elasticsearch treats this as a "get document, apply edits, save document" operation. In our code I think it would be something as simple as exposing the configuration toggle and changing: {code} batch.add(String.format("{ \"index\" : %s }%n%s%n", documentAddress, document)); {code} to {code} String operation = spec.isPartialUpdate() ? "update" : "index"; batch.add(String.format("{ \"%s\" : %s }%n%s%n", operation, documentAddress, document)); {code} New fields being introduced and schema compatibility seem no different to the current model (you can push nonsense JSON to a live Elasticsearch using today). Or am I overlooking something please? > Enable partial updates for Elasticsearch > ---------------------------------------- > > Key: BEAM-4389 > URL: https://issues.apache.org/jira/browse/BEAM-4389 > Project: Beam > Issue Type: New Feature > Components: io-java-elasticsearch > Affects Versions: 2.4.0 > Reporter: Tim Robertson > Assignee: Tim Robertson > Priority: Major > > Expose a configuration option on the {{ElasticsearchIO}} to enable partial > updates rather than full document inserts. > Rationale: We have the case where different pipelines process different > categories of information of the target entity (e.g. one for taxonomic > processing, another for geospatial processing). A read and merge is not > possible inside the batch call, meaning the only way to do it is through a > join. The join approach is slow, and also stops the ability to run a single > process in isolation (e.g. reprocess the geospatial component of all docs). > Use of this configuration parameter has to be used in conjunction with > controlling the document ID (possible since BEAM-3201) to make sense. > The client API would include a {{withUsePartialUpdate(true)}} such as: > {code} > source.apply( > ElasticsearchIO.write() > .withConnectionConfiguration(connectionConfiguration) > .withIdFn(new ExtractValueFn("id")) > .withUsePartialUpdate(true) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)