Hi Gordon, A first step: https://github.com/apache/flink/pull/6043
-- Christophe On Wed, May 16, 2018 at 12:16 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> wrote: > Good to know! Thanks a lot for pushing this Christophe. > > Please ping me when the new PR is opened, and we can continue the > discussion there. > Ideally we have this in early in the 1.6 release cycle, so that the > Elasticsearch e2e tests (will be merging a PR for that soon) can catching > anything unexpected. > > Cheers, > Gordon > > > On 16 May 2018 at 4:26:36 PM, Christophe Jolif (cjo...@gmail.com) wrote: > > Ok thanks for the feedback. > > > I agree. IIRC, the ES PRs that were opened also did this by changing > the return type from Client to AutoClosable, as well as letting the call > bridge also handle creation of BulkProcessors, correct? > > Correct. > > > Instead, we maintain our own request class to abstract those classes > away, and only create the actual Elasticsearch requests internally. > > I'm personally unsure I would like to have to use Flink specific request > classes instead of Elastic ones but apart from that I think we are pretty > much inline. > > I'm not exactly sure when I'll get the cycles but I will try to issue yet > another PR along those lines so we can continue the discussion from there > and hopefully we would get something in time for 1.6! > > Thanks again, > -- > Christophe > > On Wed, May 16, 2018 at 7:19 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org> > wrote: > >> Hi, >> >> What if the user in a ES5.3+ case is calling the deprecated method? You >> agree it will fail? I'm not necessarily against that. I just want to make >> it clear that we don't have a perfect solution here either? >> >> >> I think what we could do here is at least in the deprecated >> `add(ActionRequest)` method, try casting to either `IndexRequest` or >> `DeleteRequest` and forward calls to the new methods. >> As a matter of fact, this is exactly what the Elasticsearch BulkProcessor >> API is doing internally [1] [2] [3], so we should be safe here. >> Looking at the code in [1] [2] [3], we should actually also consider it >> being a `UpdateRequest` (just to correct my previous statement that it can >> only be either index or delete). >> But yes, I think there wouldn’t be a perfect solution here; something has >> to be broken / deprecated in order for our implementation to be more future >> proof. >> >> indeed we always considered that among >> ActionRequest only delete and index were actually supported (which makes >> sense to me). >> >> >> Looking at the code in [1] [2] [3], we should actually also consider it >> being a `UpdateRequest` (just to correct my previous statement that it can >> only be either index or delete). >> And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly >> state this. >> >> For the REST part, there is another incompatibility at the "Client" API >> level. Indeed the RestClient is not implementing the "Client" interface >> that is exposed in the bridge. So "just" doing the change above would >> still >> not allow to provide a REST based implementation? >> >> >> I agree. IIRC, the ES PRs that were opened also did this by changing the >> return type from Client to AutoClosable, as well as letting the call bridge >> also handle creation of BulkProcessors, correct? >> I think this is definitely the way to go for us to be more future proof. >> The general rule of thumb is that, for all APIs (either APIs of the base >> module that the ES connectors internally use, or user-facing APIs of the >> connector), we should move towards not leaking Elasticsearch classes as the >> API. >> This goes for removing Client as the return type in the call bridge >> interface. When re-working the RequestIndexer interface, we can even >> consider not directly exposing the `IndexRequest`, `DeleteRequest`, >> `UpdateRequest` classes as the API. >> Instead, we maintain our own request class to abstract those classes >> away, and only create the actual Elasticsearch requests internally. >> >> Cheers, >> Gordon >> >> [1] https://github.com/elastic/elasticsearch/blob/v5.2.2/cor >> e/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110 >> [2] https://github.com/elastic/elasticsearch/blob/v6.2.4/ser >> ver/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L128 >> >> On 16 May 2018 at 4:12:15 AM, Christophe Jolif (cjo...@gmail.com) wrote: >> >> Hi Gordon, >> >> On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org >> > >> wrote: >> >> > Hi, >> > >> > Let me first clarify a few things so that we are on the same page here: >> > >> > 1. The reason that we are looking into having a new base-module for >> > versions 5.3+ is that >> > new Elasticsearch BulkProcessor APIs are breaking some of our original >> > base API assumptions. >> > This is noticeable from the intended cast here [1]. >> > >> > 2. Given that we are looking into a new base module, Christophe proposed >> > that we focus on designing >> > the new base module around Elasticsearch’s new REST API, so that we are >> > more-future proof. >> > >> > 3. I proposed to remove / deprecate support for earlier versions, >> because >> > once we introduce the new >> > base module, we would be maintaining a lot of Elasticsearch connector >> > modules (2 base modules, 4+ version specific). >> > Of course, whether or not this really is a problem depends on how much >> > capacity the community has to maintain them, as Steve mentioned. >> > >> > >> > Now, moving on to another proposed solution that should work (at least >> for >> > now, depends on how Elasticsearch changes their API in the future): >> > The main problem we’ve bumped into is that Elasticsearch changed their >> > BulkProcessor API from accepting `add(ActionRequest)`s to >> > `add(DocWriteRequest)`s. >> > >> > The reason why our elasticsearch-base-module used the `ActionRequest` in >> > the first place was so that sinks could have full functionality to >> perform >> > both delete and write requests. >> > We can actually just consider separating `RequestIndexer#add(ActionRequ >> est >> > …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add( >> > DeleteRequest)`. >> > AFAIK, the IndexRequest class and DeleteRequest class has remained >> stable >> > across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with >> > that. >> > >> > If I’m not mistaken, this should allow us to avoid introducing a new >> > module (no need for a new base AND a 5.3+ module), still have equal >> > functionality, and continue to use the existing base module across all >> > versions. >> > The only downside would be that we’ll need to deprecate / break >> > `RequestIndexer#add(ActionRequest …)` in favor of the new separated >> > methods. >> > >> > Whether or not we move on to use the HighLevelRestClient for 6.x, would >> > then be a completely orthogonal consideration. We can simply use it in >> 6.x >> > as an implementation of the API call bridge. >> > >> > What do you think? >> > >> >> What if the user in a ES5.3+ case is calling the deprecated method? You >> agree it will fail? I'm not necessarily against that. I just want to make >> it clear that we don't have a perfect solution here either? If we ignore >> that it should be working fine if indeed we always considered that among >> ActionRequest only delete and index were actually supported (which makes >> sense to me). >> >> For the REST part, there is another incompatibility at the "Client" API >> level. Indeed the RestClient is not implementing the "Client" interface >> that is exposed in the bridge. So "just" doing the change above would >> still >> not allow to provide a REST based implementation? >> >> Best, >> > Gordon >> > >> > [1] https://github.com/apache/flink/pull/4675/files#diff- >> > 5539d0d57fa1ac17a36f08d1bb9a90b5R54 >> > >> > On 15 May 2018 at 9:15:52 AM, Steve Blackmon (sblack...@apache.org) >> wrote: >> > >> > It seems to me that if the transport client dependency is removed, the >> > same >> > module could perform inserts, updates, and deletes via the http bulk >> API, >> > and whatever version differences exist with that API could be handled >> > inside the module without any difference to the classpath of the >> > pipeline. >> > >> > If that's true there's no need or benefit to deprecating support for >> > earlier elastic version so long as someone is willing to implement test >> > and >> > maintain them. >> > >> > Steve >> > >> > On 5/13/18 at 2:00 PM, Christophe wrote: >> > >> > Hi Gordon, >> > >> > Thanks for your feedback (and Flavio for your support!) >> > >> > About your remarks/questions: >> > >> > - Maybe we can consider removing support for ES 1.x and 2.x starting >> from >> > >> > 1.6. Those are very old ES versions (considering that ES 6.x has already >> > been out for a while). Do you think this would simply how our base >> module >> > APIs are designed? >> > >> > I would tend to say it should not change drastically the picture but >> > would >> > have to look into it. >> > >> > - Wouldn't it be possible to have a REST implementation of the >> > >> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If >> > so, >> > once we remove ES 1.x and 2.x, it might actually be possible to >> > completely >> > replace the current `elasticsearch-base` module. >> > >> > The High level REST API was introduced in Elasticsearch 5.6 so it is not >> > possible to cover 5.5 and below with it. >> > >> > If all the necessary APIs are already here (to be double checked) it >> > should >> > be able cover 5.6. What I noticed when working on the PRs is that 6.2 >> > REST >> > Level High Level client API was improved to be closer to original APIs, >> > if >> > we want to support 5.6 with it we might have to rely on APIs they >> already >> > improved since then. Not dramatic. But does it worth it knowing this >> > would >> > just be giving us 5.6 not 5.2,3,4 and 5? >> > >> > Now on moving forward I read: >> > >> > I'm definitely a +1 to try to move this forward with a proper fix. >> > >> > >> > and >> > >> > Working around that would require introducing a new base module >> > >> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't >> a >> > nice way to go. >> > >> > So if I read you correctly you are ok moving with a proper fix but it >> > should not introduce a new (REST based) base module? Then to be honest >> > I'm >> > not sure how to proceed :) Any more specific feedback on the direction >> to >> > follow would be great! >> > >> > Thanks, >> > -- >> > Christophe >> > >> > On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai < >> tzuli...@apache.org> >> > >> > wrote: >> > >> > Hi Christophe, >> > >> > Thanks for bringing this up. >> > >> > Yes, the main issue with the existing PRs and preventing it from moving >> > forward is how it currently breaks initial assumptions of APIs in the >> > `elasticsearch-base` module. >> > Working around that would require introducing a new base module >> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't >> a >> > nice way to go. >> > >> > I had a quick stab at the REST API, and it seems to be promising, >> > especially given that you mentioned that starting from next versions, >> the >> > current API we use will be fully removed. >> > I'm definitely a +1 to try to move this forward with a proper fix. >> > >> > Some other remarks / questions I have: >> > - Maybe we can consider removing support for ES 1.x and 2.x starting >> from >> > 1.6. Those are very old ES versions (considering that ES 6.x has already >> > been out for a while). Do you think this would simply how our base >> module >> > APIs are designed? >> > - Wouldn't it be possible to have a REST implementation of the >> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If >> > so, >> > once we remove ES 1.x and 2.x, it might actually be possible to >> > completely >> > replace the current `elasticsearch-base` module. >> > >> > Cheers, >> > Gordon >> > >> > >> > On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier < >> pomperma...@okkam.it >> > >> > >> > >> > wrote: >> > >> > +1. Torally agree >> > >> > On Sat, 12 May 2018, 18:14 Christophe Jolif, <cjo...@gmail.com> wrote: >> > >> > Hi all, >> > >> > There is quite some time Flink Elasticsearch sink is broken for >> > Elastisearch 5.x (nearly a year): >> > >> > https://issues.apache.org/jira/browse/FLINK-7386 >> > >> > And there is no support for Elasticsearch 6.x: >> > >> > https://issues.apache.org/jira/browse/FLINK-8101 >> > >> > However several PRs were issued: >> > >> > https://github.com/apache/flink/pull/4675 >> > https://github.com/apache/flink/pull/5374 >> > >> > I also raised the issue on the mailing list in the 1.5 timeframe: >> > >> > http://apache-flink-mailing-list-archive.1008284.n3. >> > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905 >> > >> > But things are still not really moving. However this seems something >> > >> > people >> > >> > are looking for, so I would really like the community to consider that >> > >> > for >> > >> > 1.6. >> > >> > The problems I see from comments on the PRs: >> > >> > - getting something that is following the Flink APIs initially created >> > >> > is a >> > >> > nightmare because Elastic is pretty good at breaking compatibility the >> > >> > hard >> > >> > way (see in particular in the last PR the cast that have to be made to >> > >> > get >> > >> > an API that works in all cases) >> > - Elasticsearch is moving away from their "native" API Flink is using >> > >> > to >> > >> > the REST APIs so there is little common ground between pre 6 and post >> > >> > 6 >> > >> > even if Elasticsearch tried to get some level of compatibility in the >> > >> > APIs. >> > >> > >> > My fear is that by trying to kill two birds with one stone, we actually >> > >> > get >> > >> > nothing done. >> > >> > In the hope of moving that forward I would like to propose for 1.6 a >> > >> > new >> > >> > Elasticsearch 6.x+ sink that would follow the design of the previous >> > >> > ones >> > >> > BUT only leverage the new REST API and not inherit from existing >> > >> > classes. >> > >> > It would really be close to what is in my previous PR: >> > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST >> > >> > and >> > >> > so >> > avoid any "strange" cast. >> > >> > This would not fill the gap of the 5.2+ not working but at least we >> > >> > would >> > >> > be back on track with something for the future as REST API is where >> > >> > Elastic >> > >> > is going. >> > >> > If people feel there is actual interest and chances this can be merged >> > >> > I'll >> > >> > be working on issuing a new PR around that. >> > >> > Alternative is to get back working on the existing PR but it seems to >> > >> > be >> > >> > a >> > >> > dead-end at the moment and not necessarily the best option long term as >> > anyway Elasticsearch is looking into promoting the REST API. >> > >> > Please let me know what you think? >> > >> > -- >> > Christophe >> > >> >> >>