[
https://issues.apache.org/jira/browse/BEAM-3201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16414640#comment-16414640
]
Tim Robertson edited comment on BEAM-3201 at 3/27/18 9:00 AM:
--------------------------------------------------------------
[~chet.aldrich] I really don't want to step on your toes, but I needed this
functionality so have an implementation. I couldn't have done this so easily
without your work (thanks!).
(Edited link) [https://github.com/timrobertson100/beam/tree/BEAM-3201]
I made the following changes from the approach [~chet.aldrich] had started:
# Used a single Interface instead of 3 and I opted for a different signature
# Used Jackson for JSON serde
## Removes need to bring in another dependency
## I _suspect_ is in wider use so might be better as it forms part of the
public API
# Added the ability to route to different types which was not yet implemented
in your branch
# Added sanity checking to the index field (ES requires lower case)
# I opted for different test strategy to avoid using the deprecated DoFnTester
Would it be ok with you [~chet.aldrich] / [~echauchot] if I put this up for a
PR tomorrow after I tidy my reformatting error please? Other than formatting I
think it is a complete solution to this issue with test coverage.
was (Author: timrobertson100):
[~chet.aldrich] I really don't want to step on your toes, but I needed this
functionality so have an implementation. I couldn't have done this so easily
without your work (thanks!).
[https://github.com/timrobertson100/beam/commit/a6002f1a4b8388e955e512281d38001ae828cdcf]
The commit above needs a little bit of tidying as I have accidentally
reformatted the whole SolrIO incorrectly - but it is late here and I'll do it
tomorrow.
I made the following changes from the approach [~chet.aldrich] had started:
# Used a single Interface instead of 3 and I opted for a different signature
# Used Jackson for JSON serde
## Removes need to bring in another dependency
## I _suspect_ is in wider use so might be better as it forms part of the
public API
# Added the ability to route to different types which was not yet implemented
in your branch
# Added sanity checking to the index field (ES requires lower case)
# I opted for different test strategy to avoid using the deprecated DoFnTester
Would it be ok with you [~chet.aldrich] / [~echauchot] if I put this up for a
PR tomorrow after I tidy my reformatting error please? Other than formatting I
think it is a complete solution to this issue with test coverage.
> ElasticsearchIO should allow the user to optionally pass id, type and index
> per document
> ----------------------------------------------------------------------------------------
>
> Key: BEAM-3201
> URL: https://issues.apache.org/jira/browse/BEAM-3201
> Project: Beam
> Issue Type: Improvement
> Components: io-java-elasticsearch
> Reporter: Etienne Chauchot
> Assignee: Chet Aldrich
> Priority: Major
>
> *Dynamic documents id*: Today the ESIO only inserts the payload of the ES
> documents. Elasticsearch generates a document id for each record inserted. So
> each new insertion is considered as a new document. Users want to be able to
> update documents using the IO. So, for the write part of the IO, users should
> be able to provide a document id so that they could update already stored
> documents. Providing an id for the documents could also help the user on
> indempotency.
> *Dynamic ES type and ES index*: In some cases (streaming pipeline with high
> throughput) partitioning the PCollection to allow to plug to different ESIO
> instances (pointing to different index/type) is not very practical, the users
> would like to be able to set ES index/type per document.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)