Hey all, Someone asked about why there is code duplication between org.apache.common and core. The answer seemed like it might be useful to others, so including it here:
Originally Kafka was more of a proof of concept and we didn't separate the clients from the server. LinkedIn was much smaller and it wasn't open source, and keeping those separate always adds a lot of overhead. So we ended up with just one big jar. Next thing we know the kafka jar is embedded everywhere. Lot's of fallout from that - It has to be really sensitive to dependencies - Scala causes all kinds of pain for users. Ironically it causes the most pain for people using scala because of compatibility. I think the single biggest Kafka complaint was the scala clients and resulting scary exceptions, lack of javadoc, etc. - Many of the client interfaces weren't well thought out as permanent long-term commitments. - We new we had to rewrite both clients due to technical deficiencies anyway. The clients really needed to move to non-blocking I/O which is basically a rewrite on it's own. So how to go about that? Well we felt we needed to maintain the old client interfaces for a good period of time. Any kind of breaking cut-over was kind of a non-starter. But a major refactoring in place was really hard since so many classes were public and so little attention had been paid to the difference between public and private classes. Naturally since the client and server do the inverse of each other there is a ton of shared logic. So we thought we needed to break it up into three independent chunks: 1. common - shared helper code used by both clients and server 2. clients - the producer, consumer, and eventually admin java interfaces. This depends on common. 3. server - the server (and legacy clients). This is currently called core. This will depend on common and clients (because sometimes the server needs to make client requests) Common and clients were left as a single jar and just logically separate so that people wouldn't have to deal with two jars (and hence the possibility of getting different versions of each). The dependency is actually a little counter-intuitive to people--they usually think of the client as depending on the server since the client calls the server. But in terms of code dependencies it is the other way--if you depend on the client you obviously don't want to drag in the server. So to get all this done we decided to just go big and do a rewrite of the clients in Java. A result of this is that any shared code would have to move to Java (so the clients don't pull in Scala). We felt this was probably a good thing in its own right as it gave a chance to improve a few of these utility libraries like config parsing, etc. So the plan was and is: 1. Rewrite producer, release and roll out 2a. Rewrite consumer, release and roll out 2b. Migrate server from scala code to org.apache.common classes 3. Deprecate scala clients (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of these the request conversion is definitely the most pressing since having those defined twice duplicates a ton of work. We will have to be hyper-conscientious during the conversion about making the shared code in common really solve the problem well and conveniently on the server as well (so we don't end up just shoe-horning it in). My hope is that we can treat this common code really well--it isn't as permanent as the public classes but ends up heavily used so we should take good care of it. Most the shared code is private so we can refactor the stuff in common to meet the needs of the server if we find mismatches or missing functionality. I tried to keep in mind the eventual server usage while writing it, but I doubt it will be as trivial as just deleting the old and adding the new. In terms of the simplicity: - Converting exceptions should be trivial - Converting utils is straight-forward but we should evaluate the individual utilities and see if they actually make sense, have tests, are used, etc. - Converting the requests may not be too complex but touches a huge hunk of code and may require some effort to decouple the network layer. - Converting the network code will be delicate and may require some changes in org.apache.common.network to meet the server's needs This is all a lot of work, but if we stick to it at the end we will have really nice clients and a nice modular code base. :-) Cheers, -Jay