Thanks for the background. I picked the Network classes portion of it, since I was already looking at how to refactor send/receive and friends to support extending with TLS and SASL. Having to do this in just one place will be really nice :)
Gwen On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > Hey all, > > Someone asked about why there is code duplication between org.apache.common > and core. The answer seemed like it might be useful to others, so including > it here: > > Originally Kafka was more of a proof of concept and we didn't separate the > clients from the server. LinkedIn was much smaller and it wasn't open > source, and keeping those separate always adds a lot of overhead. So we > ended up with just one big jar. > > Next thing we know the kafka jar is embedded everywhere. Lot's of fallout > from that > - It has to be really sensitive to dependencies > - Scala causes all kinds of pain for users. Ironically it causes the most > pain for people using scala because of compatibility. I think the single > biggest Kafka complaint was the scala clients and resulting scary > exceptions, lack of javadoc, etc. > - Many of the client interfaces weren't well thought out as permanent > long-term commitments. > - We new we had to rewrite both clients due to technical deficiencies > anyway. The clients really needed to move to non-blocking I/O which is > basically a rewrite on it's own. > > So how to go about that? > > Well we felt we needed to maintain the old client interfaces for a good > period of time. Any kind of breaking cut-over was kind of a non-starter. > But a major refactoring in place was really hard since so many classes were > public and so little attention had been paid to the difference between > public and private classes. > > Naturally since the client and server do the inverse of each other there is > a ton of shared logic. So we thought we needed to break it up into three > independent chunks: > 1. common - shared helper code used by both clients and server > 2. clients - the producer, consumer, and eventually admin java interfaces. > This depends on common. > 3. server - the server (and legacy clients). This is currently called core. > This will depend on common and clients (because sometimes the server needs > to make client requests) > > Common and clients were left as a single jar and just logically separate so > that people wouldn't have to deal with two jars (and hence the possibility > of getting different versions of each). > > The dependency is actually a little counter-intuitive to people--they > usually think of the client as depending on the server since the client > calls the server. But in terms of code dependencies it is the other way--if > you depend on the client you obviously don't want to drag in the server. > > So to get all this done we decided to just go big and do a rewrite of the > clients in Java. A result of this is that any shared code would have to > move to Java (so the clients don't pull in Scala). We felt this was > probably a good thing in its own right as it gave a chance to improve a few > of these utility libraries like config parsing, etc. > > So the plan was and is: > 1. Rewrite producer, release and roll out > 2a. Rewrite consumer, release and roll out > 2b. Migrate server from scala code to org.apache.common classes > 3. Deprecate scala clients > > (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of > these the request conversion is definitely the most pressing since having > those defined twice duplicates a ton of work. We will have to be > hyper-conscientious during the conversion about making the shared code in > common really solve the problem well and conveniently on the server as well > (so we don't end up just shoe-horning it in). My hope is that we can treat > this common code really well--it isn't as permanent as the public classes > but ends up heavily used so we should take good care of it. Most the shared > code is private so we can refactor the stuff in common to meet the needs of > the server if we find mismatches or missing functionality. I tried to keep > in mind the eventual server usage while writing it, but I doubt it will be > as trivial as just deleting the old and adding the new. > > In terms of the simplicity: > - Converting exceptions should be trivial > - Converting utils is straight-forward but we should evaluate the > individual utilities and see if they actually make sense, have tests, are > used, etc. > - Converting the requests may not be too complex but touches a huge hunk of > code and may require some effort to decouple the network layer. > - Converting the network code will be delicate and may require some changes > in org.apache.common.network to meet the server's needs > > This is all a lot of work, but if we stick to it at the end we will have > really nice clients and a nice modular code base. :-) > > Cheers, > > -Jay >