Argh, I just realized that the producer and consumer have already almost removed that so it wouldn't be in common but just something for the broker. Maybe later this year 0.9/1.0 item to crack into.
On Sun, Feb 8, 2015 at 11:34 AM, Joe Stein <joe.st...@stealth.ly> wrote: > Jay, > > Can we add another package (or two) to org.apache.kafka.common for > metadata and consensus. We can call them something else but the idea would > be to have 1 common layer for meta data information (right now we put the > json into zookeeper) and 1 common layer for asynchronous watches (which we > wait for zookeeper to call us). It would be great to have that code > something we can wrap zkclient around (or currator) that can insulate the > different options growing in both of those areas. > > Both the meta data code and async watches we would be able to run any > class we load in supporting the interface expected. The async watch > interface can have as an input to pass the loaded class a callback and when > the watcher fires (regardless if from etcd or zookeeper) the code gets the > response it expected and needed. We should also expose a function that > returns a future from the watcher. > > This may cause a little more work also if we wanted to take the JSON and > turn that into byte structure ... or we just keep to the JSON and keep to > making it describable and self documenting? > > For the meta data information I think that is separate because that data > right now (outside of kafka) already resides in other systems like > databases and/or caches. Folks may opt just to switch the meta data out to > reduce the burden on zookeeper to just doing the asynchronous watchers. > Some folks may want to swap both out. > > These two layers could also just be 2-3 more files in utils. > > - Joestein > > On Sun, Feb 8, 2015 at 11:04 AM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> Thanks for the background. >> >> I picked the Network classes portion of it, since I was already looking at >> how to refactor send/receive and friends to support extending with TLS and >> SASL. Having to do this in just one place will be really nice :) >> >> Gwen >> >> On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote: >> >> > Hey all, >> > >> > Someone asked about why there is code duplication between >> org.apache.common >> > and core. The answer seemed like it might be useful to others, so >> including >> > it here: >> > >> > Originally Kafka was more of a proof of concept and we didn't separate >> the >> > clients from the server. LinkedIn was much smaller and it wasn't open >> > source, and keeping those separate always adds a lot of overhead. So we >> > ended up with just one big jar. >> > >> > Next thing we know the kafka jar is embedded everywhere. Lot's of >> fallout >> > from that >> > - It has to be really sensitive to dependencies >> > - Scala causes all kinds of pain for users. Ironically it causes the >> most >> > pain for people using scala because of compatibility. I think the single >> > biggest Kafka complaint was the scala clients and resulting scary >> > exceptions, lack of javadoc, etc. >> > - Many of the client interfaces weren't well thought out as permanent >> > long-term commitments. >> > - We new we had to rewrite both clients due to technical deficiencies >> > anyway. The clients really needed to move to non-blocking I/O which is >> > basically a rewrite on it's own. >> > >> > So how to go about that? >> > >> > Well we felt we needed to maintain the old client interfaces for a good >> > period of time. Any kind of breaking cut-over was kind of a non-starter. >> > But a major refactoring in place was really hard since so many classes >> were >> > public and so little attention had been paid to the difference between >> > public and private classes. >> > >> > Naturally since the client and server do the inverse of each other >> there is >> > a ton of shared logic. So we thought we needed to break it up into three >> > independent chunks: >> > 1. common - shared helper code used by both clients and server >> > 2. clients - the producer, consumer, and eventually admin java >> interfaces. >> > This depends on common. >> > 3. server - the server (and legacy clients). This is currently called >> core. >> > This will depend on common and clients (because sometimes the server >> needs >> > to make client requests) >> > >> > Common and clients were left as a single jar and just logically >> separate so >> > that people wouldn't have to deal with two jars (and hence the >> possibility >> > of getting different versions of each). >> > >> > The dependency is actually a little counter-intuitive to people--they >> > usually think of the client as depending on the server since the client >> > calls the server. But in terms of code dependencies it is the other >> way--if >> > you depend on the client you obviously don't want to drag in the server. >> > >> > So to get all this done we decided to just go big and do a rewrite of >> the >> > clients in Java. A result of this is that any shared code would have to >> > move to Java (so the clients don't pull in Scala). We felt this was >> > probably a good thing in its own right as it gave a chance to improve a >> few >> > of these utility libraries like config parsing, etc. >> > >> > So the plan was and is: >> > 1. Rewrite producer, release and roll out >> > 2a. Rewrite consumer, release and roll out >> > 2b. Migrate server from scala code to org.apache.common classes >> > 3. Deprecate scala clients >> > >> > (2a) Is is in flight now, and that means (2b) is totally up for grabs. >> Of >> > these the request conversion is definitely the most pressing since >> having >> > those defined twice duplicates a ton of work. We will have to be >> > hyper-conscientious during the conversion about making the shared code >> in >> > common really solve the problem well and conveniently on the server as >> well >> > (so we don't end up just shoe-horning it in). My hope is that we can >> treat >> > this common code really well--it isn't as permanent as the public >> classes >> > but ends up heavily used so we should take good care of it. Most the >> shared >> > code is private so we can refactor the stuff in common to meet the >> needs of >> > the server if we find mismatches or missing functionality. I tried to >> keep >> > in mind the eventual server usage while writing it, but I doubt it will >> be >> > as trivial as just deleting the old and adding the new. >> > >> > In terms of the simplicity: >> > - Converting exceptions should be trivial >> > - Converting utils is straight-forward but we should evaluate the >> > individual utilities and see if they actually make sense, have tests, >> are >> > used, etc. >> > - Converting the requests may not be too complex but touches a huge >> hunk of >> > code and may require some effort to decouple the network layer. >> > - Converting the network code will be delicate and may require some >> changes >> > in org.apache.common.network to meet the server's needs >> > >> > This is all a lot of work, but if we stick to it at the end we will have >> > really nice clients and a nice modular code base. :-) >> > >> > Cheers, >> > >> > -Jay >> > >> > >