Tim Robertson created BEAM-4389:
-----------------------------------
Summary: Enable partial updates for Elasticsearch
Key: BEAM-4389
URL: https://issues.apache.org/jira/browse/BEAM-4389
Project: Beam
Issue Type: New Feature
Components: io-java-elasticsearch
Affects Versions: 2.4.0
Reporter: Tim Robertson
Assignee: Tim Robertson
Expose a configuration option on the {{ElasticsearchIO}} to enable partial
updates rather than full document inserts.
Rationale: We have the case where different pipelines process different
categories of information of the target entity (e.g. one for taxonomic
processing, another for geospatial processing). A read and merge is not
possible inside the batch call, meaning the only way to do it is through a
join. The join approach is slow, and also stops the ability to run a single
process in isolation (e.g. reprocess the geospatial component of all docs).
Use of this configuration parameter has to be used in conjunction with
controlling the document ID (possible since BEAM-3201) to make sense.
The client API would include a {{withUsePartialUpdate(true)}} such as:
{code}
source.apply(
ElasticsearchIO.write()
.withConnectionConfiguration(connectionConfiguration)
.withIdFn(new ExtractValueFn("id"))
.withUsePartialUpdate(true)
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)