[
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16264618#comment-16264618
]
Nicholas Verbeck commented on BEAM-3201:
----------------------------------------
[~echauchot] When dealing with Time series data, as well as other sets of
highly dynamic data, in a streaming fashion. The partition approach is just not
a practical one.
If we only do the function approach then I'd either suggest we change the
method signature to look for a JSON object or parse the string provided and
give it to each function. The extra CPU overhead for both functions to
deserialize the JSON to a usable data model would be a big waste, especially
when dealing with large volumes of data. Which is my use-case. If we don't want
to go that route then I'd suggest doing both; the field look up and function.
This would just leave it up to the engineer to make a choice that suites their
use-case and resources better.
> ElasticsearchIO should deal with documents id
> ---------------------------------------------
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
> Issue Type: Improvement
> Components: sdk-java-extensions
> Reporter: Etienne Chauchot
> Assignee: Chet Aldrich
>
> Today the ESIO only inserts the payload of the ES documents. Elasticsearch
> generates a document id for each record inserted. So each new insertion is
> considered as a new document. Users want to be able to update documents using
> the IO. So, for the write part of the IO, users should be able to provide a
> document id so that they could update already stored documents. Providing an
> id for the documents could also help the user on indempotency.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)