There are many ways to approach this problem and I think I have worked on a at least one project which has tried them all:
Apache MRUnit => Kept versions in maven modules and eventually separate source trees Apache Hive (and Parquet ) => Used an extensive "shim" layer to allow Hive to work with dozens of Hadoop releases StreamSets => each component is a classloader isolated component The approach StreamSets has taken is the most sustainable approach to working with many versions. Back when I worked on https://issues.apache.org/jira/browse/FLUME-1735 we discussed doing something similar with Flume. Having now worked on an implementation of that, I can say it was one of the more difficult problems I've worked on. Doing this in Flume would be a many month effort which would likely break backwards compatibility. The Shim approach Hive took ended up being a complete nightmare with someone always complaining when we removed support for one Hadoop version. This approach built a tremendous amount of technical debt. I would not take that approach again. The module approach MRUnit also became a maintenance nightmare. All patches need to be tested multiple times as unrelated changes can break one profile due to dependency version changes and you have to be able output multiple artifacts. For my money, the best approach was the separate source tree approach. Which is approach Jeff is suggesting. Users who want older Kafka can use 1.6 and those that want newer flume use 1.7. Anyway who want a 1.7 feature with 1.6 can do the cherry-pick. Based on the aforementioned experiences, I would strongly suggest we take this approach. *From:* Ralph Goers <[email protected]> *Date:* December 22, 2015 at 1:29:08 PM EST *To:* [email protected] *Subject:* *Re: [DISCUSSION] Flume-Kafka 0.9 Support* *Reply-To:* [email protected] Why not simply move the “old” kafka components to their own maven module? Then you can keep them as part of the distribution for the next release or two. Ralph On Dec 22, 2015, at 8:10 AM, Jarek Jarcec Cecho <[email protected]> wrote: It’s unfortunate that in order to support the new features in Kafka 0.9 (primarily the security additions), one have to lose support of previous version (0.8). I do believe that the security additions that have been added recently are important enough for us to migrate to the new version of Kafka and use it for the next Flume release. If some people will need to continue using future Flume version with Kafka 0.8, they should be able to simply take 1.6.0 version of Kafka Channel/Source/Sink jars and use them with the new agent, so we do have a mitigation plan if needed. Jarcec On Dec 22, 2015, at 3:26 PM, Jeff Holoman <[email protected]> wrote: With the new release of Kafka I wanted to start the discussion on how best to handle updating Flume to be able to make use of some of the new features available in 0.9. First, it is important for Flume to adopt the 0.9 Kafka Clients as the new Consumer / Producer API's are the only APIs that support new Security features put into the latest Kafka release such as SSL. If we agree that this is important, then we need to consider how best to make this switch. With many projects, we could just update the jars/clients and move along happily, however, the Kafka compatibility story complicates this. - Kafka promises to be backward compatible with clients - i.e. A 0.8.x client can talk to a 0.9.x broker - Kafka does not promise to be forward compatible (at all) from client perspective: - i.e. A 0.9.x client can not talk to a 0.8.x broker - If it works, its is by luck and not reliable, even for old functionality - This is due to protocol changes and no way for the client to know the version of Kafka it’s talking to. Hopefully KIP-35 (Retrieving protocol version) will move this in the right direction. - Integrations that utilize Kafka 0.9.x clients will not be able to talk to Kafka 0.8.x brokers at all and may get cryptic error messages when doing so. - Integrations will only be able to support one major version of Kafka at a time without more complex class-loading - Note: The kafka_2.10 artifact depends on the kafka-clients artifact so you cannot have both kafka-clients & kafka_2.10 of different versions at the same time without collision - However older clients (0.8.x) will work when talking to 0.9.x server. - But that is pretty much useless as the benefits of 0.9.x (security features) won’t be available. Given these constraints, and after careful consideration, I propose that we do the following. 1) Update the Kafka libraries to the latest 0.9/0.9+ release and update the Source, Sink and Channel Implementations to make use of the new Kafka Clients 2) Document that Flume no longer supports Kafka Brokers < 0.9 Given that both producer and clients will be updated, there will need to be changes in agent configurations to support the new clients. This means if upgrading Flume, existing agent configurations will break. I don't see a clean way around this, unfortunately. This seems to be a situation where we break things, and document this to be the case.
