[ 
https://issues.apache.org/jira/browse/BEAM-12601?focusedWorklogId=632928&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-632928
 ]

ASF GitHub Bot logged work on BEAM-12601:
-----------------------------------------

                Author: ASF GitHub Bot
            Created on: 03/Aug/21 13:55
            Start Date: 03/Aug/21 13:55
    Worklog Time Spent: 10m 
      Work Description: egalpin commented on a change in pull request #15257:
URL: https://github.com/apache/beam/pull/15257#discussion_r681775815



##########
File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1512,10 +1528,14 @@ static String createBulkApiEntity(DocToBulk spec, 
String document, int backendVe
           isDelete = spec.getIsDeleteFn().apply(parsedDocument);
         }
       }
+      final boolean isAppendOnly = Boolean.TRUE.equals(spec.getAppendOnly());

Review comment:
       I also think that we should be consistent in how we make use of the 
various settings: either we should use the approach of 
`Boolean.TRUE.equals(...)` or the existing non-nullable approach setting a 
default via builder.  I personally prefer the method you've used for its 
flexibility and ease of maintenance over default values set via builder.
   
   Could you please update `getUsePartialUpdate` to be used in the same manner?
   

##########
File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1333,6 +1337,18 @@ public DocToBulk withUsePartialUpdate(boolean 
usePartialUpdate) {
       return builder().setUsePartialUpdate(usePartialUpdate).build();
     }
 
+    /**
+     * Provide an instruction to control whether the target index should be 
considered append-only.
+     * For append-only indexes and/or data streams, only {@code create} 
operations will be issued.
+     * Updates and deletions are not allowed, so related options will be 
ignored.
+     *
+     * @param appendOnly set to true to allow one document appending

Review comment:
       nit:  typo I think.  `to allow only` rather than `to allow one`

##########
File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1333,6 +1337,18 @@ public DocToBulk withUsePartialUpdate(boolean 
usePartialUpdate) {
       return builder().setUsePartialUpdate(usePartialUpdate).build();
     }
 
+    /**
+     * Provide an instruction to control whether the target index should be 
considered append-only.
+     * For append-only indexes and/or data streams, only {@code create} 
operations will be issued.

Review comment:
       One interesting point about create Vs. index:  unless specifying one's 
own document IDs, the two methods result in  the same behaviour[1].  I think 
having a note here in the docstring explaining that point would be a 
nice-to-have addition. Really, we should rely on users to consult the 
Elasticsearch (or any Sink's) docs to determine their needs, but illustrating 
that the 2 options are effectively equivalent depending on other settings might 
prompt users to double check the ES docs for their use case.
   
   [1] 
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#docs-bulk-api-desc

##########
File path: 
sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java
##########
@@ -1512,10 +1528,14 @@ static String createBulkApiEntity(DocToBulk spec, 
String document, int backendVe
           isDelete = spec.getIsDeleteFn().apply(parsedDocument);
         }
       }
+      final boolean isAppendOnly = Boolean.TRUE.equals(spec.getAppendOnly());
 
       if (isDelete) {
+        checkState(!isAppendOnly, "No deletions allowed for append-only 
indices");
         // delete request used for deleting a document
         return String.format("{ \"delete\" : %s }%n", documentMetadata);
+      } else if (isAppendOnly) {

Review comment:
       I wonder about this control structure a bit. The `isDelete` case is 
special because its truthiness will change depending on the document and a 
function applied to that document. The other settings all always have constant 
behaviour regardless of input document. 
   
   Given the number of branches here, I think it would be best to restructure 
this section to return early whenever possible and reduce the nesting of the 
logic
   
   So with both those points said, I think we should unnest the 
settings-branching making each setting its own `if-return` rather than having 
any `else/else if` such that all settings are flat but listed in order of 
precedence.
   
   Thoughts?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Issue Time Tracking
-------------------

    Worklog Id:     (was: 632928)
    Time Spent: 1h 50m  (was: 1h 40m)

> Support append-only indices in ES output 
> -----------------------------------------
>
>                 Key: BEAM-12601
>                 URL: https://issues.apache.org/jira/browse/BEAM-12601
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-elasticsearch
>            Reporter: Andres Rodriguez
>            Priority: P2
>          Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Currently, the Apache Beam Elasticsearch sink is 
> [using|https://github.com/apache/beam/blob/master/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L1532]
>  the 
> [index|https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-request-body]
>  bulk API operation to add data to the target index. When using append-only 
> indices it is better to use the 
> [create|https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html#bulk-api-request-body]
>  operation. This also applies to new append-only indexes, like [data 
> streams|https://www.elastic.co/guide/en/elasticsearch/reference/7.x/use-a-data-stream.html#add-documents-to-a-data-stream].
> The scope of this improvement is to add a new boolean configuration option, 
> {{append-only}}, to the Elasticsearch sink, with a default value of {{false}} 
> (to keep the current behaivour) that when enabled makes it use the {{create}} 
> operation instead of the {{index}} one when sending data.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to