[ 
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264618#comment-16264618
 ] 

Nicholas Verbeck commented on BEAM-3201:
----------------------------------------

[~echauchot] When dealing with Time series data, as well as other sets of 
highly dynamic data, in a streaming fashion. The partition approach is just not 
a practical one.

If we only do the function approach then I'd either suggest we change the 
method signature to look for a JSON object or parse the string provided and 
give it to each function. The extra CPU overhead for both functions to 
deserialize the JSON to a usable data model would be a big waste, especially 
when dealing with large volumes of data. Which is my use-case. If we don't want 
to go that route then I'd suggest doing both; the field look up and function. 
This would just leave it up to the engineer to make a choice that suites their 
use-case and resources better. 


> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
>                 Key: BEAM-3201
>                 URL: https://issues.apache.org/jira/browse/BEAM-3201
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-extensions
>            Reporter: Etienne Chauchot
>            Assignee: Chet Aldrich
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch 
> generates a document id for each record inserted. So each new insertion is 
> considered as a new document. Users want to be able to update documents using 
> the IO. So, for the write part of the IO, users should be able to provide a 
> document id so that they could update already stored documents. Providing an 
> id for the documents could also help the user on indempotency.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to