+1 for removing storm-kafka from master, since we shouldn't encourage people to use a component that won't work on new Kafka versions. As you both mentioned, the 1.x version of storm-kafka should still be usable on a 2.0 cluster, so it will still be available in case people need it. A wiki page for tracking current missing pieces for storm-kafka-client sounds good.
2017-07-19 19:09 GMT+02:00 Harsha <[email protected]>: > +1 on moving away from storm-kafka for Storm 2.0. For existing users we > can provide any critical bug fixes and provide it as part of 1.x > releases. They can still use the existing 1.x storm-kafka against 2.0. > Since kafka itself is moving away from older APIs continuing two > versions of kafka connector doesnt’ make sense and honestly splits the > usage which doesn’t give us any feedback on new storm-kafka-client. > Thanks, > Harsha > > On Wed, Jul 19, 2017, at 09:20 AM, Hugo Da Cruz Louro wrote: > > Hi, > > > > The goal of this email is to summarize and unify the discussion started > > across several email threads (Storm 2.0 > > Roadmap<http://search-hadoop.com/?project=Storm&q=%22% > 5BDISCUSS%5D+Storm+2.0+Roadmap%22>, > > 1.1.1 Release > > Planning<http://search-hadoop.com/m/Storm/8gnYyGagLDWv1qG? > subj=Release+Planning+for+1+1+1+and+others+>, > > Lag > > Issues<http://search-hadoop.com/m/Storm/8gnYyLmjIjYr692? > subj=Lag+issues+using+Storm+1+1+1+latest+build+with+ > StormKafkaClient+1+1+1+vs+old+StormKafka+spouts>) > > concerning the maintenance, branch support, and eventual deprecation of > > storm-kafka and storm-kafka-client. > > > > It was proposed in an earlier > > discussion<http://search-hadoop.com/?project=Storm&q=% > 22%5BDISCUSS%5D+Storm+2.0+Roadmap%22> > > the plan to deprecate storm-kafka in prol of storm-kafka-client. To > > clarify, the idea is not to completely eliminate storm-kafka, but rather > > keep supporting it in the 1.x-branch, while removing it from master (i.e. > > Storm 2.0 onwards). That is, storm-kafka-client will then become the only > > Storm Kafka option available for Storm 2.0 onwards, given that we have > > enough confidence in its stability by the time of the Storm 2.0 release. > > > > The main reason for this proposal is the fact that the Kafka community > > agreed<https://cwiki.apache.org/confluence/display/KAFKA/ > KIP-109:+Old+Consumer+Deprecation> > > to deprecate the old consumer APIs starting in version 0.10.2, and will > > remove them in the next major version (0.12). This implies that > > storm-kafka will not work for Kafka 0.12 onwards. Important features > > missing in the old Kafka consumer are: security, new message format, and > > fetching offsets based on time stamp (KIP-79). > > > > In earlier discussions the Storm community has shown concerns about the > > performance and stability of the storm-kafka-client. Those concerns are > > valid and were mirrored by the Kafka community in their early deprecation > > discussions. I align with what was said in the Kafka > > discussion<http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721>: the > > storm-kafka-client has bugs, but so does storm-kafka, and all the > > development is currently going into storm-kafka-client, which will be > > even more prevalent in face of Kafka discontinuing the old consumer > > API’s. The only way to stabilize a complex component such as > > storm-kafka-client is to test it extensively in all its variants, which > > inevitably comes from users using it. Furthermore, removing storm-kafka > > from Storm 2.0 does not prevent users from still referring to storm-kafka > > version 1.x in their topologies. > > > > I did a quick analysis of the JIRA issues for storm-kafka and > > storm-kafka-client [1]. As of July 11 there are 22 open or in-progress > > bugs for storm-kafka (1 blocker) and 15 for storm-kafka-client. > > > > The recent refactoring around manual partition assignment should solve a > > lot of edge case bugs that occurred during rebalance. There are also a > > few open pull requests for Trident and fixing some internal state > > details such as maxUncommittedOffsets, topic compaction, etc. > > Nevertheless, there are several areas that need to be addressed to > > stabilize and improve storm-kafka-client. Similarly to what was done for > > Storm SQL I suggest that we create a wiki page where we can centralize > > some points of action such as: > > > > Features / Stability > > * Memory Footprint > > * Retrial Mechanism > > * Exactly once and at least once guarantees > > * Kafka Lag > > * Metrics > > * Spout Internals (e.g. maxUncommittedOffsets, ack, emitted, failed, > > ...) > > * Autocommit mode > > > > Performance. > > * Run performance benchmarks > > > > Integration Testing > > * Test for exactly once in non failure scenarios (e.g. > > activate/deactivate) > > * Test for at least once in failure scenarios > > * Test Trident guarantees > > > > Unit Testing > > * Identify unit test coverage and find a modular way to continually add > > new tests > > > > Trident > > * Pull request<https://github.com/apache/storm/pull/2174> for review > > > > API > > * Investigate for gaps in API between storm-kafka and > > storm-kafka-client. > > * Can we discontinue the old API ? > > > > Documentation > > * Check for accuracy and completeness of documentation > > * Make clean code snippets with examples available > > > > [1] - The data was extracted from JIRA on 07/11/2017. The > > storm-kafka-client JIRAs were checked for correctness of component label, > > and had their status updated. None of that was done for the storm-kafka > > JIRAs, therefore some of its issues marked as open may already have been > > fixed. The results and charts can be found here: > > * > > storm-kafka-jiras<https://docs.google.com/spreadsheets/d/ > 1pdqAKDtqfhPrfgFxnQa4bSrKP1YBdMyuGzqr3gLzcMA/edit?usp=sharing> > > * > > storm-kafka-client-jiras<https://docs.google.com/spreadsheets/d/ > 12g0HLz4pgODMVVOmzvti1nzLOa6iygmk8pyTOv8op1c/edit?usp=sharing> >
