Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

Rahul Singh Thu, 17 May 2018 05:11:09 -0700

First sentence fat fingered.

“Just curious as to why there’s an issue with the backing topics for Kafka 
Connect.”


--
Rahul Singh
rahul.si...@anant.us

Anant Corporation

On May 17, 2018, 6:17 AM -0400, Stephane Maarek 
<steph...@simplemachines.com.au>, wrote:
> Hi Salius
>
> I think you're on the money, but you're not pushing things too far.
> This is something I've hoped for a long time.
> Let's talk Kafka Connect v2
>
> Kafka Connect Cluster, as you said, are not convenient to work with (the
> KIP details drawbacks well). I'm all about containerisation just like
> stream apps support (and boasts!).
>
> Now, here's the problem with Kafka Connect. There are three backing topics.
> Here's the analysis of how they can evolve:
> - Config topic: this one is irrelevant if each connect cluster comes with a
> config bundled with the corresponding JAR, as you mentioned in your KIP
> - Status topic: this is something I wish was gone too. The consumers have a
> coordinator, and I believe the connect workers should have a coordinator
> too, for task rebalancing.
> - Source Offset topic: only relevant for sources. I wish there was a
> __connect_offsets global topic just like for consumers and an
> "ConnectOffsetCoordinator" to talk to to retrieve latest committed offset.
>
> If we look above, with a few back-end fundamental transformations, we can
> probably make Connect "cluster-less".
>
> What the community would get out of it is huge:
> - Connect workers for a specific connector are independent and isolated,
> measurable (in CPU and Mem) and auto-scalable
> - CI/CD is super easy to integrate, as it's just another container / jar.
> - You can roll restart a specific connector and upgrade a JAR without
> interrupting your other connectors and while keeping the current connector
> from running.
> - The topics backing connect are removed except the global one, which
> allows you to scale easily in terms of number of connectors
> - Running a connector in dev or prod (for people offering connectors) is as
> easy as doing a simple "docker run".
> - Each consumer / producer settings can be configured at the container
> level.
> - Each connect process is immutable in configuration.
> - Each connect process has its own security identity (right now, you need a
> connect cluster per service role, which is a lot of overhead in terms of
> backing topic)
>
> Now, I don't have the Kafka expertise to know exactly which changes to make
> in the code, but I believe the final idea is achievable.
> The change would be breaking for how Kafka Connect is run, but I think
> there's a chance to make the change non breaking to how Connect is
> programmed. I believe the same public API framework can be used.
>
> Finally, the REST API can be used for monitoring, or the JMX metrics as
> usual.
>
> I may be completely wrong, but I would see such a change drive the
> utilisation, management of Connect by a lot while lowering the barrier to
> adoption.
>
> This change may be big to implement but probably worthwhile. I'd be happy
> to provide more "user feedback" on a PR, but probably won't be able to
> implement a PR myself.
>
> More than happy to discuss this
>
> Best,
> Stephane
>
>
> Kind regards,
> Stephane
>
> [image: Simple Machines]
>
> Stephane Maarek | Developer
>
> +61 416 575 980
> steph...@simplemachines.com.au
> simplemachines.com.au
> Level 2, 145 William Street, Sydney NSW 2010
>
> On 17 May 2018 at 14:42, Saulius Valatka <saulius...@gmail.com> wrote:
>
> > Hi,
> >
> > the only real usecase for the REST interface I can see is providing
> > health/liveness checks for mesos/kubernetes. It's also true that the API
> > can be left as is and e.g. not exposed publicly on the platform level, but
> > this would still leave opportunities to accidentally mess something up
> > internally, so it's mostly a safety concern.
> >
> > Regarding the option renaming: I agree that it's not necessary as it's not
> > clashing with anything, my reasoning is that assuming some other offset
> > storage appears in the future, having all config properties at the root
> > level of offset.storage.* _MIGHT_ introduce clashes in the future, so this
> > is just a suggestion for introducing a convention of
> > offset.storage.<store>.<properties>, which the existing
> > property offset.storage.file.filename already adheres to. But in general,
> > yes -- this can be left as is.
> >
> >
> >
> > 2018-05-17 1:20 GMT+03:00 Jakub Scholz <ja...@scholz.cz>:
> >
> > > Hi,
> > >
> > > What do you plan to use the read-only REST interface for? Is there
> > > something what you cannot get through metrics interface? Otherwise it
> > might
> > > be easier to just disable the REST interface (either in the code, or just
> > > on the platform level - e.g. in Kubernetes).
> > >
> > > Also, I do not know what is the usual approach in Kafka ... but do we
> > > really have to rename the offset.storage.* options? The current names do
> > > not seem to have any collision with what you are adding and they would
> > get
> > > "out of sync" with the other options used in connect (status.storage.*
> > and
> > > config.storage.*). So it seems a bit unnecessary change to me.
> > >
> > > Jakub
> > >
> > >
> > >
> > > On Wed, May 16, 2018 at 10:10 PM Saulius Valatka <saulius...@gmail.com
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I'd like to start a discussion on the following KIP:
> > > >
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 304%3A+Connect+runtime+mode+improvements+for+container+platforms
> > > >
> > > > Basically the idea is to make it easier to run separate instances of
> > > Kafka
> > > > Connect hosting isolated connectors on container platforms such as
> > Mesos
> > > or
> > > > Kubernetes.
> > > >
> > > > In particular it would be interesting to hear opinions about the
> > proposed
> > > > read-only REST API mode, more specifically I'm concerned about the
> > > > possibility to implement it in distributed mode as it appears the
> > > framework
> > > > is using it internally (
> > > >
> > > > https://github.com/apache/kafka/blob/trunk/connect/
> > > runtime/src/main/java/org/apache/kafka/connect/runtime/
> > > distributed/DistributedHerder.java#L1019
> > > > ),
> > > > however this particular API method appears to be undocumented(?).
> > > >
> > > > Looking forward for your feedback.
> > > >
> > >
> >

Re: [DISCUSS] KIP-304: Connect runtime mode improvements for container platforms

Reply via email to