Thanks for the background.

I picked the Network classes portion of it, since I was already looking at
how to refactor send/receive and friends to support extending with TLS and
SASL. Having to do this in just one place will be really nice :)

Gwen

On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote:

> Hey all,
>
> Someone asked about why there is code duplication between org.apache.common
> and core. The answer seemed like it might be useful to others, so including
> it here:
>
> Originally Kafka was more of a proof of concept and we didn't separate the
> clients from the server. LinkedIn was much smaller and it wasn't open
> source, and keeping those separate always adds a lot of overhead. So we
> ended up with just one big jar.
>
> Next thing we know the kafka jar is embedded everywhere. Lot's of fallout
> from that
> - It has to be really sensitive to dependencies
> - Scala causes all kinds of pain for users. Ironically it causes the most
> pain for people using scala because of compatibility. I think the single
> biggest Kafka complaint was the scala clients and resulting scary
> exceptions, lack of javadoc, etc.
> - Many of the client interfaces weren't well thought out as permanent
> long-term commitments.
> - We new we had to rewrite both clients due to technical deficiencies
> anyway. The clients really needed to move to non-blocking I/O which is
> basically a rewrite on it's own.
>
> So how to go about that?
>
> Well we felt we needed to maintain the old client interfaces for a good
> period of time. Any kind of breaking cut-over was kind of a non-starter.
> But a major refactoring in place was really hard since so many classes were
> public and so little attention had been paid to the difference between
> public and private classes.
>
> Naturally since the client and server do the inverse of each other there is
> a ton of shared logic. So we thought we needed to break it up into three
> independent chunks:
> 1. common - shared helper code used by both clients and server
> 2. clients - the producer, consumer, and eventually admin java interfaces.
> This depends on common.
> 3. server - the server (and legacy clients). This is currently called core.
> This will depend on common and clients (because sometimes the server needs
> to make client requests)
>
> Common and clients were left as a single jar and just logically separate so
> that people wouldn't have to deal with two jars (and hence the possibility
> of getting different versions of each).
>
> The dependency is actually a little counter-intuitive to people--they
> usually think of the client as depending on the server since the client
> calls the server. But in terms of code dependencies it is the other way--if
> you depend on the client you obviously don't want to drag in the server.
>
> So to get all this done we decided to just go big and do a rewrite of the
> clients in Java. A result of this is that any shared code would have to
> move to Java (so the clients don't pull in Scala). We felt this was
> probably a good thing in its own right as it gave a chance to improve a few
> of these utility libraries like config parsing, etc.
>
> So the plan was and is:
> 1. Rewrite producer, release and roll out
> 2a. Rewrite consumer, release and roll out
> 2b. Migrate server from scala code to org.apache.common classes
> 3. Deprecate scala clients
>
> (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of
> these the request conversion is definitely the most pressing since having
> those defined twice duplicates a ton of work. We will have to be
> hyper-conscientious during the conversion about making the shared code in
> common really solve the problem well and conveniently on the server as well
> (so we don't end up just shoe-horning it in). My hope is that we can treat
> this common code really well--it isn't as permanent as the public classes
> but ends up heavily used so we should take good care of it. Most the shared
> code is private so we can refactor the stuff in common to meet the needs of
> the server if we find mismatches or missing functionality. I tried to keep
> in mind the eventual server usage while writing it, but I doubt it will be
> as trivial as just deleting the old and adding the new.
>
> In terms of the simplicity:
> - Converting exceptions should be trivial
> - Converting utils is straight-forward but we should evaluate the
> individual utilities and see if they actually make sense, have tests, are
> used, etc.
> - Converting the requests may not be too complex but touches a huge hunk of
> code and may require some effort to decouple the network layer.
> - Converting the network code will be delicate and may require some changes
> in org.apache.common.network to meet the server's needs
>
> This is all a lot of work, but if we stick to it at the end we will have
> really nice clients and a nice modular code base. :-)
>
> Cheers,
>
> -Jay
>

Reply via email to