[jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id

Chet Aldrich (JIRA) Wed, 22 Nov 2017 14:33:34 -0800

    [ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263472#comment-16263472
 ]


Chet Aldrich commented on BEAM-3201:
------------------------------------

[~echauchot] First of all, thanks for getting that all sorted for me.

[~nerdynick] 

{quote}That said, if you want to have dynamic index/type (meaning do not use 
ConnectionConfiguration.withIndex and ConnectionConfiguration.withType) and 
also dynamic id depending of the document itself, we should add 3 optional user 
defined functions so that the user can provide them. I guess it makes the 
withDocumentIdField(String fieldName) redundant. So we should not implement 
it.{quote}

According to what Etienne said here, it seems like if we want to go this route 
we may want to rethink the design for this, especially since I agree with him 
about not polluting the document payload. 

However, I'm not necessarily sold on why this is necessary in the first place. 
Could you ([~nerdynick]) elaborate more on why your use case requires 
dynamically changing the index and type that you're writing on a per-element 
basis? Why not just split up the elements and write to a separate index via a 
separate sink with a different `ConnectionConfiguration`? IMHO one write 
operation should write to only one index, since, for example, it'd be odd to be 
writing entries to two different DB tables depending on a given element instead 
of just splitting them up into separate PCollections and _then_ writing them 
out to the different tables with separate sinks. 

This opinion is only based on my current understanding of what you're trying to 
accomplish though. Feel free to enlighten me if an assumption I made about your 
use case is incorrect. 

Would appreciate input from both of you on whether this use case is needed, and 
if it is, whether we should rethink how we're approaching this so we don't 
pollute the document payload with metadata. 










> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
>                 Key: BEAM-3201
>                 URL: https://issues.apache.org/jira/browse/BEAM-3201
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Chet Aldrich
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch 
> generates a document id for each record inserted. So each new insertion is 
> considered as a new document. Users want to be able to update documents using 
> the IO. So, for the write part of the IO, users should be able to provide a 
> document id so that they could update already stored documents. Providing an 
> id for the documents could also help the user on indempotency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (BEAM-3201) ElasticsearchIO should deal with documents id

Reply via email to