[ 
https://issues.apache.org/jira/browse/BEAM-4389?focusedWorklogId=106619&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-106619
 ]

ASF GitHub Bot logged work on BEAM-4389:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 29/May/18 13:10
            Start Date: 29/May/18 13:10
    Worklog Time Spent: 10m 
      Work Description: echauchot commented on a change in pull request #5463: 
[BEAM-4389] Enable partial updates in ElasticsearchIO
URL: https://github.com/apache/beam/pull/5463#discussion_r191411251
 
 

 ##########
 File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
 ##########
 @@ -945,9 +970,18 @@ private static String lowerCaseOrNull(String input) {
       @ProcessElement
       public void processElement(ProcessContext context) throws Exception {
         String document = context.element();
-        String documentAddress = getDocumentAddress(document);
+        String documentMetadata = getDocumentMetadata(document);
+
+        // index is an insert/upsert and update is a partial update (or insert 
if not existing)
+        if (spec.getUsePartialUpdate()) {
+          batch.add(
+              String.format(
+                  "{ \"update\" : %s }%n{ \"doc\" : %s, \"doc_as_upsert\" : 
true }%n",
 
 Review comment:
   indeed better to do an upsert rather than the simple update that was on your 
first commit

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 106619)
    Time Spent: 0.5h  (was: 20m)

> Enable partial updates Elasticsearch
> ------------------------------------
>
>                 Key: BEAM-4389
>                 URL: https://issues.apache.org/jira/browse/BEAM-4389
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-java-elasticsearch
>    Affects Versions: 2.4.0
>            Reporter: Tim Robertson
>            Assignee: Tim Robertson
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Expose a configuration option on the {{ElasticsearchIO}} to enable partial 
> updates rather than full document inserts. 
> Rationale: We have the case where different pipelines process different 
> categories of information of the target entity (e.g. one for taxonomic 
> processing, another for geospatial processing). A read and merge is not 
> possible inside the batch call, meaning the only way to do it is through a 
> join. The join approach is slow, and also stops the ability to run a single 
> process in isolation (e.g. reprocess the geospatial component of all docs).
> Use of this configuration parameter has to be used in conjunction with 
> controlling the document ID (possible since BEAM-3201) to make sense.
> The client API would include a {{withUseUpdate(...)}} such as:
> {code}
> source.apply(
>   ElasticsearchIO.write()
>     .withConnectionConfiguration(connectionConfiguration)
>     .withIdFn(new ExtractValueFn("id"))
>     .withUseUpdate(true)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to