Re: org.apache.common migration

Joe Stein Sun, 08 Feb 2015 09:51:22 -0800

Argh, I just realized that the producer and consumer have already almost
removed that so it wouldn't be in common but just something for the
broker.  Maybe later this year 0.9/1.0 item to crack into.


On Sun, Feb 8, 2015 at 11:34 AM, Joe Stein <joe.st...@stealth.ly> wrote:

> Jay,
>
> Can we add another package (or two) to org.apache.kafka.common for
> metadata and consensus.  We can call them something else but the idea would
> be to have 1 common layer for meta data information (right now we put the
> json into zookeeper) and 1 common layer for asynchronous watches (which we
> wait for zookeeper to call us). It would be great to have that code
> something we can wrap zkclient around (or currator) that can insulate the
> different options growing in both of those areas.
>
> Both the meta data code and async watches we would be able to run any
> class we load in supporting the interface expected. The async watch
> interface can have as an input to pass the loaded class a callback and when
> the watcher fires (regardless if from etcd or zookeeper) the code gets the
> response it expected and needed. We should also expose a function that
> returns a future from the watcher.
>
> This may cause a little more work also if we wanted to take the JSON and
> turn that into byte structure ... or we just keep to the JSON and keep to
> making it describable and self documenting?
>
> For the meta data information I think that is separate because that data
> right now (outside of kafka) already resides in other systems like
> databases and/or caches. Folks may opt just to switch the meta data out to
> reduce the burden on zookeeper to just doing the asynchronous watchers.
> Some folks may want to swap both out.
>
> These two layers could also just be 2-3 more files in utils.
>
> - Joestein
>
> On Sun, Feb 8, 2015 at 11:04 AM, Gwen Shapira <gshap...@cloudera.com>
> wrote:
>
>> Thanks for the background.
>>
>> I picked the Network classes portion of it, since I was already looking at
>> how to refactor send/receive and friends to support extending with TLS and
>> SASL. Having to do this in just one place will be really nice :)
>>
>> Gwen
>>
>> On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote:
>>
>> > Hey all,
>> >
>> > Someone asked about why there is code duplication between
>> org.apache.common
>> > and core. The answer seemed like it might be useful to others, so
>> including
>> > it here:
>> >
>> > Originally Kafka was more of a proof of concept and we didn't separate
>> the
>> > clients from the server. LinkedIn was much smaller and it wasn't open
>> > source, and keeping those separate always adds a lot of overhead. So we
>> > ended up with just one big jar.
>> >
>> > Next thing we know the kafka jar is embedded everywhere. Lot's of
>> fallout
>> > from that
>> > - It has to be really sensitive to dependencies
>> > - Scala causes all kinds of pain for users. Ironically it causes the
>> most
>> > pain for people using scala because of compatibility. I think the single
>> > biggest Kafka complaint was the scala clients and resulting scary
>> > exceptions, lack of javadoc, etc.
>> > - Many of the client interfaces weren't well thought out as permanent
>> > long-term commitments.
>> > - We new we had to rewrite both clients due to technical deficiencies
>> > anyway. The clients really needed to move to non-blocking I/O which is
>> > basically a rewrite on it's own.
>> >
>> > So how to go about that?
>> >
>> > Well we felt we needed to maintain the old client interfaces for a good
>> > period of time. Any kind of breaking cut-over was kind of a non-starter.
>> > But a major refactoring in place was really hard since so many classes
>> were
>> > public and so little attention had been paid to the difference between
>> > public and private classes.
>> >
>> > Naturally since the client and server do the inverse of each other
>> there is
>> > a ton of shared logic. So we thought we needed to break it up into three
>> > independent chunks:
>> > 1. common - shared helper code used by both clients and server
>> > 2. clients - the producer, consumer, and eventually admin java
>> interfaces.
>> > This depends on common.
>> > 3. server - the server (and legacy clients). This is currently called
>> core.
>> > This will depend on common and clients (because sometimes the server
>> needs
>> > to make client requests)
>> >
>> > Common and clients were left as a single jar and just logically
>> separate so
>> > that people wouldn't have to deal with two jars (and hence the
>> possibility
>> > of getting different versions of each).
>> >
>> > The dependency is actually a little counter-intuitive to people--they
>> > usually think of the client as depending on the server since the client
>> > calls the server. But in terms of code dependencies it is the other
>> way--if
>> > you depend on the client you obviously don't want to drag in the server.
>> >
>> > So to get all this done we decided to just go big and do a rewrite of
>> the
>> > clients in Java. A result of this is that any shared code would have to
>> > move to Java (so the clients don't pull in Scala). We felt this was
>> > probably a good thing in its own right as it gave a chance to improve a
>> few
>> > of these utility libraries like config parsing, etc.
>> >
>> > So the plan was and is:
>> > 1. Rewrite producer, release and roll out
>> > 2a. Rewrite consumer, release and roll out
>> > 2b. Migrate server from scala code to org.apache.common classes
>> > 3. Deprecate scala clients
>> >
>> > (2a) Is is in flight now, and that means (2b) is totally up for grabs.
>> Of
>> > these the request conversion is definitely the most pressing since
>> having
>> > those defined twice duplicates a ton of work. We will have to be
>> > hyper-conscientious during the conversion about making the shared code
>> in
>> > common really solve the problem well and conveniently on the server as
>> well
>> > (so we don't end up just shoe-horning it in). My hope is that we can
>> treat
>> > this common code really well--it isn't as permanent as the public
>> classes
>> > but ends up heavily used so we should take good care of it. Most the
>> shared
>> > code is private so we can refactor the stuff in common to meet the
>> needs of
>> > the server if we find mismatches or missing functionality. I tried to
>> keep
>> > in mind the eventual server usage while writing it, but I doubt it will
>> be
>> > as trivial as just deleting the old and adding the new.
>> >
>> > In terms of the simplicity:
>> > - Converting exceptions should be trivial
>> > - Converting utils is straight-forward but we should evaluate the
>> > individual utilities and see if they actually make sense, have tests,
>> are
>> > used, etc.
>> > - Converting the requests may not be too complex but touches a huge
>> hunk of
>> > code and may require some effort to decouple the network layer.
>> > - Converting the network code will be delicate and may require some
>> changes
>> > in org.apache.common.network to meet the server's needs
>> >
>> > This is all a lot of work, but if we stick to it at the end we will have
>> > really nice clients and a nice modular code base. :-)
>> >
>> > Cheers,
>> >
>> > -Jay
>> >
>>
>
>

Re: org.apache.common migration

Reply via email to