Hi Gordon,

A first step: https://github.com/apache/flink/pull/6043

--
Christophe

On Wed, May 16, 2018 at 12:16 PM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
wrote:

> Good to know! Thanks a lot for pushing this Christophe.
>
> Please ping me when the new PR is opened, and we can continue the
> discussion there.
> Ideally we have this in early in the 1.6 release cycle, so that the
> Elasticsearch e2e tests (will be merging a PR for that soon) can catching
> anything unexpected.
>
> Cheers,
> Gordon
>
>
> On 16 May 2018 at 4:26:36 PM, Christophe Jolif (cjo...@gmail.com) wrote:
>
> Ok thanks for the feedback.
>
> > I agree. IIRC, the ES PRs that were opened also did this by changing
> the return type from Client to AutoClosable, as well as letting the call
> bridge also handle creation of BulkProcessors, correct?
>
> Correct.
>
> > Instead, we maintain our own request class to abstract those classes
> away, and only create the actual Elasticsearch requests internally.
>
> I'm personally unsure I would like to have to use Flink specific request
> classes instead of Elastic ones but apart from that I think we are pretty
> much inline.
>
> I'm not exactly sure when I'll get the cycles but I will try to issue yet
> another PR along those lines so we can continue the discussion from there
> and hopefully we would get something in time for 1.6!
>
> Thanks again,
> --
> Christophe
>
> On Wed, May 16, 2018 at 7:19 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org>
> wrote:
>
>> Hi,
>>
>> What if the user in a ES5.3+ case is calling the deprecated method? You
>> agree it will fail? I'm not necessarily against that. I just want to make
>> it clear that we don't have a perfect solution here either?
>>
>>
>> I think what we could do here is at least in the deprecated
>> `add(ActionRequest)` method, try casting to either `IndexRequest` or
>> `DeleteRequest` and forward calls to the new methods.
>> As a matter of fact, this is exactly what the Elasticsearch BulkProcessor
>> API is doing internally [1] [2] [3], so we should be safe here.
>> Looking at the code in [1] [2] [3], we should actually also consider it
>> being a `UpdateRequest` (just to correct my previous statement that it can
>> only be either index or delete).
>> But yes, I think there wouldn’t be a perfect solution here; something has
>> to be broken / deprecated in order for our implementation to be more future
>> proof.
>>
>> indeed we always considered that among
>> ActionRequest only delete and index were actually supported (which makes
>> sense to me).
>>
>>
>> Looking at the code in [1] [2] [3], we should actually also consider it
>> being a `UpdateRequest` (just to correct my previous statement that it can
>> only be either index or delete).
>> And yes, the Elasticsearch Javadocs of the BulkProcessor also clearly
>> state this.
>>
>> For the REST part, there is another incompatibility at the "Client" API
>> level. Indeed the RestClient is not implementing the "Client" interface
>> that is exposed in the bridge. So "just" doing the change above would
>> still
>> not allow to provide a REST based implementation?
>>
>>
>> I agree. IIRC, the ES PRs that were opened also did this by changing the
>> return type from Client to AutoClosable, as well as letting the call bridge
>> also handle creation of BulkProcessors, correct?
>> I think this is definitely the way to go for us to be more future proof.
>> The general rule of thumb is that, for all APIs (either APIs of the base
>> module that the ES connectors internally use, or user-facing APIs of the
>> connector), we should move towards not leaking Elasticsearch classes as the
>> API.
>> This goes for removing Client as the return type in the call bridge
>> interface. When re-working the RequestIndexer interface, we can even
>> consider not directly exposing the `IndexRequest`, `DeleteRequest`,
>> `UpdateRequest` classes as the API.
>> Instead, we maintain our own request class to abstract those classes
>> away, and only create the actual Elasticsearch requests internally.
>>
>> Cheers,
>> Gordon
>>
>> [1] https://github.com/elastic/elasticsearch/blob/v5.2.2/cor
>> e/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L110
>> [2] https://github.com/elastic/elasticsearch/blob/v6.2.4/ser
>> ver/src/main/java/org/elasticsearch/action/bulk/BulkRequest.java#L128
>>
>> On 16 May 2018 at 4:12:15 AM, Christophe Jolif (cjo...@gmail.com) wrote:
>>
>> Hi Gordon,
>>
>> On Tue, May 15, 2018 at 6:16 AM, Tzu-Li (Gordon) Tai <tzuli...@apache.org
>> >
>> wrote:
>>
>> > Hi,
>> >
>> > Let me first clarify a few things so that we are on the same page here:
>> >
>> > 1. The reason that we are looking into having a new base-module for
>> > versions 5.3+ is that
>> > new Elasticsearch BulkProcessor APIs are breaking some of our original
>> > base API assumptions.
>> > This is noticeable from the intended cast here [1].
>> >
>> > 2. Given that we are looking into a new base module, Christophe proposed
>> > that we focus on designing
>> > the new base module around Elasticsearch’s new REST API, so that we are
>> > more-future proof.
>> >
>> > 3. I proposed to remove / deprecate support for earlier versions,
>> because
>> > once we introduce the new
>> > base module, we would be maintaining a lot of Elasticsearch connector
>> > modules (2 base modules, 4+ version specific).
>> > Of course, whether or not this really is a problem depends on how much
>> > capacity the community has to maintain them, as Steve mentioned.
>> >
>> >
>> > Now, moving on to another proposed solution that should work (at least
>> for
>> > now, depends on how Elasticsearch changes their API in the future):
>> > The main problem we’ve bumped into is that Elasticsearch changed their
>> > BulkProcessor API from accepting `add(ActionRequest)`s to
>> > `add(DocWriteRequest)`s.
>> >
>> > The reason why our elasticsearch-base-module used the `ActionRequest` in
>> > the first place was so that sinks could have full functionality to
>> perform
>> > both delete and write requests.
>> > We can actually just consider separating `RequestIndexer#add(ActionRequ
>> est
>> > …)` to `RequestIndexer#add(IndexRequest)` and `RequestIndexer#add(
>> > DeleteRequest)`.
>> > AFAIK, the IndexRequest class and DeleteRequest class has remained
>> stable
>> > across all ES versions (1.x., 2.x, 5.x, 6.x), so we should be safe with
>> > that.
>> >
>> > If I’m not mistaken, this should allow us to avoid introducing a new
>> > module (no need for a new base AND a 5.3+ module), still have equal
>> > functionality, and continue to use the existing base module across all
>> > versions.
>> > The only downside would be that we’ll need to deprecate / break
>> > `RequestIndexer#add(ActionRequest …)` in favor of the new separated
>> > methods.
>> >
>> > Whether or not we move on to use the HighLevelRestClient for 6.x, would
>> > then be a completely orthogonal consideration. We can simply use it in
>> 6.x
>> > as an implementation of the API call bridge.
>> >
>> > What do you think?
>> >
>>
>> What if the user in a ES5.3+ case is calling the deprecated method? You
>> agree it will fail? I'm not necessarily against that. I just want to make
>> it clear that we don't have a perfect solution here either? If we ignore
>> that it should be working fine if indeed we always considered that among
>> ActionRequest only delete and index were actually supported (which makes
>> sense to me).
>>
>> For the REST part, there is another incompatibility at the "Client" API
>> level. Indeed the RestClient is not implementing the "Client" interface
>> that is exposed in the bridge. So "just" doing the change above would
>> still
>> not allow to provide a REST based implementation?
>>
>> Best,
>> > Gordon
>> >
>> > [1] https://github.com/apache/flink/pull/4675/files#diff-
>> > 5539d0d57fa1ac17a36f08d1bb9a90b5R54
>> >
>> > On 15 May 2018 at 9:15:52 AM, Steve Blackmon (sblack...@apache.org)
>> wrote:
>> >
>> > It seems to me that if the transport client dependency is removed, the
>> > same
>> > module could perform inserts, updates, and deletes via the http bulk
>> API,
>> > and whatever version differences exist with that API could be handled
>> > inside the module without any difference to the classpath of the
>> > pipeline.
>> >
>> > If that's true there's no need or benefit to deprecating support for
>> > earlier elastic version so long as someone is willing to implement test
>> > and
>> > maintain them.
>> >
>> > Steve
>> >
>> > On 5/13/18 at 2:00 PM, Christophe wrote:
>> >
>> > Hi Gordon,
>> >
>> > Thanks for your feedback (and Flavio for your support!)
>> >
>> > About your remarks/questions:
>> >
>> > - Maybe we can consider removing support for ES 1.x and 2.x starting
>> from
>> >
>> > 1.6. Those are very old ES versions (considering that ES 6.x has already
>> > been out for a while). Do you think this would simply how our base
>> module
>> > APIs are designed?
>> >
>> > I would tend to say it should not change drastically the picture but
>> > would
>> > have to look into it.
>> >
>> > - Wouldn't it be possible to have a REST implementation of the
>> >
>> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
>> > so,
>> > once we remove ES 1.x and 2.x, it might actually be possible to
>> > completely
>> > replace the current `elasticsearch-base` module.
>> >
>> > The High level REST API was introduced in Elasticsearch 5.6 so it is not
>> > possible to cover 5.5 and below with it.
>> >
>> > If all the necessary APIs are already here (to be double checked) it
>> > should
>> > be able cover 5.6. What I noticed when working on the PRs is that 6.2
>> > REST
>> > Level High Level client API was improved to be closer to original APIs,
>> > if
>> > we want to support 5.6 with it we might have to rely on APIs they
>> already
>> > improved since then. Not dramatic. But does it worth it knowing this
>> > would
>> > just be giving us 5.6 not 5.2,3,4 and 5?
>> >
>> > Now on moving forward I read:
>> >
>> > I'm definitely a +1 to try to move this forward with a proper fix.
>> >
>> >
>> > and
>> >
>> > Working around that would require introducing a new base module
>> >
>> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
>> a
>> > nice way to go.
>> >
>> > So if I read you correctly you are ok moving with a proper fix but it
>> > should not introduce a new (REST based) base module? Then to be honest
>> > I'm
>> > not sure how to proceed :) Any more specific feedback on the direction
>> to
>> > follow would be great!
>> >
>> > Thanks,
>> > --
>> > Christophe
>> >
>> > On Sun, May 13, 2018 at 5:39 AM, Tzu-Li (Gordon) Tai <
>> tzuli...@apache.org>
>> >
>> > wrote:
>> >
>> > Hi Christophe,
>> >
>> > Thanks for bringing this up.
>> >
>> > Yes, the main issue with the existing PRs and preventing it from moving
>> > forward is how it currently breaks initial assumptions of APIs in the
>> > `elasticsearch-base` module.
>> > Working around that would require introducing a new base module
>> > specifically for 5.3+ and 6.x+, which we've also agreed on the PR isn't
>> a
>> > nice way to go.
>> >
>> > I had a quick stab at the REST API, and it seems to be promising,
>> > especially given that you mentioned that starting from next versions,
>> the
>> > current API we use will be fully removed.
>> > I'm definitely a +1 to try to move this forward with a proper fix.
>> >
>> > Some other remarks / questions I have:
>> > - Maybe we can consider removing support for ES 1.x and 2.x starting
>> from
>> > 1.6. Those are very old ES versions (considering that ES 6.x has already
>> > been out for a while). Do you think this would simply how our base
>> module
>> > APIs are designed?
>> > - Wouldn't it be possible to have a REST implementation of the
>> > `ElasticsearchSinkCallBridge` for 5.x (covering both 5.2 and 5.3+)? If
>> > so,
>> > once we remove ES 1.x and 2.x, it might actually be possible to
>> > completely
>> > replace the current `elasticsearch-base` module.
>> >
>> > Cheers,
>> > Gordon
>> >
>> >
>> > On Sun, May 13, 2018 at 12:36 AM, Flavio Pompermaier <
>> pomperma...@okkam.it
>> >
>> >
>> >
>> > wrote:
>> >
>> > +1. Torally agree
>> >
>> > On Sat, 12 May 2018, 18:14 Christophe Jolif, <cjo...@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > There is quite some time Flink Elasticsearch sink is broken for
>> > Elastisearch 5.x (nearly a year):
>> >
>> > https://issues.apache.org/jira/browse/FLINK-7386
>> >
>> > And there is no support for Elasticsearch 6.x:
>> >
>> > https://issues.apache.org/jira/browse/FLINK-8101
>> >
>> > However several PRs were issued:
>> >
>> > https://github.com/apache/flink/pull/4675
>> > https://github.com/apache/flink/pull/5374
>> >
>> > I also raised the issue on the mailing list in the 1.5 timeframe:
>> >
>> > http://apache-flink-mailing-list-archive.1008284.n3.
>> > nabble.com/DISCUSS-Releasing-Flink-1-5-0-td20867.html#a20905
>> >
>> > But things are still not really moving. However this seems something
>> >
>> > people
>> >
>> > are looking for, so I would really like the community to consider that
>> >
>> > for
>> >
>> > 1.6.
>> >
>> > The problems I see from comments on the PRs:
>> >
>> > - getting something that is following the Flink APIs initially created
>> >
>> > is a
>> >
>> > nightmare because Elastic is pretty good at breaking compatibility the
>> >
>> > hard
>> >
>> > way (see in particular in the last PR the cast that have to be made to
>> >
>> > get
>> >
>> > an API that works in all cases)
>> > - Elasticsearch is moving away from their "native" API Flink is using
>> >
>> > to
>> >
>> > the REST APIs so there is little common ground between pre 6 and post
>> >
>> > 6
>> >
>> > even if Elasticsearch tried to get some level of compatibility in the
>> >
>> > APIs.
>> >
>> >
>> > My fear is that by trying to kill two birds with one stone, we actually
>> >
>> > get
>> >
>> > nothing done.
>> >
>> > In the hope of moving that forward I would like to propose for 1.6 a
>> >
>> > new
>> >
>> > Elasticsearch 6.x+ sink that would follow the design of the previous
>> >
>> > ones
>> >
>> > BUT only leverage the new REST API and not inherit from existing
>> >
>> > classes.
>> >
>> > It would really be close to what is in my previous PR:
>> > https://github.com/apache/flink/pull/5374 but just focus on E6+/REST
>> >
>> > and
>> >
>> > so
>> > avoid any "strange" cast.
>> >
>> > This would not fill the gap of the 5.2+ not working but at least we
>> >
>> > would
>> >
>> > be back on track with something for the future as REST API is where
>> >
>> > Elastic
>> >
>> > is going.
>> >
>> > If people feel there is actual interest and chances this can be merged
>> >
>> > I'll
>> >
>> > be working on issuing a new PR around that.
>> >
>> > Alternative is to get back working on the existing PR but it seems to
>> >
>> > be
>> >
>> > a
>> >
>> > dead-end at the moment and not necessarily the best option long term as
>> > anyway Elasticsearch is looking into promoting the REST API.
>> >
>> > Please let me know what you think?
>> >
>> > --
>> > Christophe
>> >
>>
>>
>>

Reply via email to