Re: Ratelimiting in the Flink Kafka connector

Thomas Weise Fri, 01 Feb 2019 08:20:51 -0800

Hi Becket,

The throttling operator would suffer from the same issue of not being able
to accurately count bytes.


On the other hand, it can be used by composition w/o modifying existing
operators.

As for sinks, wouldn't an operator that adjusts the rate in front of the
sink suffice?

Thomas


On Thu, Jan 31, 2019 at 11:42 PM Becket Qin <[email protected]> wrote:

> Hi Thomas,
>
> Good point about counting bytes. It would be difficult to throttle the byte
> rate with the existing API. And it seems that for sinks we have to do that
> rate limiting in the sink implementation anyways. There are a few ways to
> do some abstraction, but maybe adding a RateLimiter is trivial enough so we
> don't need to worry about reusing the throttling logic.
>
> But in any case, let's make sure the throttling threshold configuration
> names are the same for all the Source and Sinks. So the config parsing
> logic should probably still be put together in place. That is probably some
> implementation details we can discuss when review the patch.
>
> I am not sure about adding another throttling operator. How would that
> operator get the serialized size if it is downstream of a source. And how
> would that work on the sink side?
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
>
>
>
>
>
> On Fri, Feb 1, 2019 at 11:33 AM Thomas Weise <[email protected]> wrote:
>
> > I initially thought of an approach similar to the collector idea, by
> > overriding emitRecord in the fetcher. That makes counting the bytes
> > difficult, because it's downstream of decoding.
> >
> > Another idea of solving this in a reusable way was to have a separate
> rate
> > limiting operator chained downstream of the consumer, which would develop
> > back pressure and slow down the consumer. However, that would interfere
> > with checkpoint barrier alignment (AFAIK, currently checkpoint barrier
> will
> > also be stuck in the backlog)?
> >
> > Thomas
> >
> >
> >
> > On Thu, Jan 31, 2019 at 7:13 PM Ken Krugler <[email protected]
> >
> > wrote:
> >
> > > +1, and something I was planning to comment on in the Jira issue.
> > >
> > > Also, if rate limiting could effectively stop the stream, then this
> could
> > > be used solve a common data enrichment issue.
> > >
> > > Logically you want to pause one stream (typically the time series data
> > > being processed) while another stream (typically the broadcast) is
> > > broadcasting an update to enrichment data.
> > >
> > > Currently you have to buffer the time series data in your enrichment
> > > function, but if the rate limiter was pluggable, it could detect when
> > this
> > > enrichment update was happening.
> > >
> > >  — Ken
> > >
> > > > On Jan 31, 2019, at 6:10 PM, Becket Qin <[email protected]>
> wrote:
> > > >
> > > > Hi Jamie,
> > > >
> > > > Thanks for the explanation. That makes sense to me. I am wondering if
> > > there
> > > > is a more general way to add a rate limiter to all the connecters
> > rather
> > > > than doing that for each individual one. For example, maybe we can
> have
> > > the
> > > > rate limiting logic in the Collector / Output, thus all the
> connectors
> > > > (even operators?) could be rate limited.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > On Fri, Feb 1, 2019 at 7:23 AM Lakshmi Gururaja Rao
> > > <[email protected]>
> > > > wrote:
> > > >
> > > >> Thanks for adding more context @Jamie Grier <[email protected]> .
> > > >>
> > > >> JIRA for this feature:
> > > https://issues.apache.org/jira/browse/FLINK-11501.
> > > >>
> > > >> Thanks
> > > >> Lakshmi
> > > >>
> > > >> On Thu, Jan 31, 2019 at 3:20 PM Thomas Weise <[email protected]>
> wrote:
> > > >>
> > > >>> I think it would be reasonable to have a rate limiter option in the
> > > >>> consumer, given that others have also looked to solve this.
> > > >>>
> > > >>> I think for this and other optional features, it would be good to
> > > >> implement
> > > >>> in a way that overrides are possible. Someone else may want to do
> the
> > > >>> limiting differently, taking into account more/other factors.
> > > >>>
> > > >>> Both, adding the limiter and making the consumer code more
> adoptable
> > > >> could
> > > >>> be split into separate work also.
> > > >>>
> > > >>> BTW is there a JIRA for this?
> > > >>>
> > > >>> Thomas
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> *Lakshmi Gururaja Rao*
> > > >> SWE
> > > >> 217.778.7218 <+12177787218>
> > > >> [image: Lyft] <http://www.lyft.com/>
> > > >>
> > >
> > > --------------------------
> > > Ken Krugler
> > > +1 530-210-6378
> > > http://www.scaleunlimited.com
> > > Custom big data solutions & training
> > > Flink, Solr, Hadoop, Cascading & Cassandra
> > >
> > >
> >
>

Re: Ratelimiting in the Flink Kafka connector

Reply via email to