+1 I’m fine with taking this approach. -Taylor
> On Jul 19, 2017, at 2:04 PM, Stig Rohde Døssing <[email protected]> > wrote: > > +1 for removing storm-kafka from master, since we shouldn't encourage > people to use a component that won't work on new Kafka versions. As you > both mentioned, the 1.x version of storm-kafka should still be usable on a > 2.0 cluster, so it will still be available in case people need it. A wiki > page for tracking current missing pieces for storm-kafka-client sounds good. > > 2017-07-19 19:09 GMT+02:00 Harsha <[email protected]>: > >> +1 on moving away from storm-kafka for Storm 2.0. For existing users we >> can provide any critical bug fixes and provide it as part of 1.x >> releases. They can still use the existing 1.x storm-kafka against 2.0. >> Since kafka itself is moving away from older APIs continuing two >> versions of kafka connector doesnt’ make sense and honestly splits the >> usage which doesn’t give us any feedback on new storm-kafka-client. >> Thanks, >> Harsha >> >> On Wed, Jul 19, 2017, at 09:20 AM, Hugo Da Cruz Louro wrote: >>> Hi, >>> >>> The goal of this email is to summarize and unify the discussion started >>> across several email threads (Storm 2.0 >>> Roadmap<http://search-hadoop.com/?project=Storm&q=%22% >> 5BDISCUSS%5D+Storm+2.0+Roadmap%22>, >>> 1.1.1 Release >>> Planning<http://search-hadoop.com/m/Storm/8gnYyGagLDWv1qG? >> subj=Release+Planning+for+1+1+1+and+others+>, >>> Lag >>> Issues<http://search-hadoop.com/m/Storm/8gnYyLmjIjYr692? >> subj=Lag+issues+using+Storm+1+1+1+latest+build+with+ >> StormKafkaClient+1+1+1+vs+old+StormKafka+spouts>) >>> concerning the maintenance, branch support, and eventual deprecation of >>> storm-kafka and storm-kafka-client. >>> >>> It was proposed in an earlier >>> discussion<http://search-hadoop.com/?project=Storm&q=% >> 22%5BDISCUSS%5D+Storm+2.0+Roadmap%22> >>> the plan to deprecate storm-kafka in prol of storm-kafka-client. To >>> clarify, the idea is not to completely eliminate storm-kafka, but rather >>> keep supporting it in the 1.x-branch, while removing it from master (i.e. >>> Storm 2.0 onwards). That is, storm-kafka-client will then become the only >>> Storm Kafka option available for Storm 2.0 onwards, given that we have >>> enough confidence in its stability by the time of the Storm 2.0 release. >>> >>> The main reason for this proposal is the fact that the Kafka community >>> agreed<https://cwiki.apache.org/confluence/display/KAFKA/ >> KIP-109:+Old+Consumer+Deprecation> >>> to deprecate the old consumer APIs starting in version 0.10.2, and will >>> remove them in the next major version (0.12). This implies that >>> storm-kafka will not work for Kafka 0.12 onwards. Important features >>> missing in the old Kafka consumer are: security, new message format, and >>> fetching offsets based on time stamp (KIP-79). >>> >>> In earlier discussions the Storm community has shown concerns about the >>> performance and stability of the storm-kafka-client. Those concerns are >>> valid and were mirrored by the Kafka community in their early deprecation >>> discussions. I align with what was said in the Kafka >>> discussion<http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721>: the >>> storm-kafka-client has bugs, but so does storm-kafka, and all the >>> development is currently going into storm-kafka-client, which will be >>> even more prevalent in face of Kafka discontinuing the old consumer >>> API’s. The only way to stabilize a complex component such as >>> storm-kafka-client is to test it extensively in all its variants, which >>> inevitably comes from users using it. Furthermore, removing storm-kafka >>> from Storm 2.0 does not prevent users from still referring to storm-kafka >>> version 1.x in their topologies. >>> >>> I did a quick analysis of the JIRA issues for storm-kafka and >>> storm-kafka-client [1]. As of July 11 there are 22 open or in-progress >>> bugs for storm-kafka (1 blocker) and 15 for storm-kafka-client. >>> >>> The recent refactoring around manual partition assignment should solve a >>> lot of edge case bugs that occurred during rebalance. There are also a >>> few open pull requests for Trident and fixing some internal state >>> details such as maxUncommittedOffsets, topic compaction, etc. >>> Nevertheless, there are several areas that need to be addressed to >>> stabilize and improve storm-kafka-client. Similarly to what was done for >>> Storm SQL I suggest that we create a wiki page where we can centralize >>> some points of action such as: >>> >>> Features / Stability >>> * Memory Footprint >>> * Retrial Mechanism >>> * Exactly once and at least once guarantees >>> * Kafka Lag >>> * Metrics >>> * Spout Internals (e.g. maxUncommittedOffsets, ack, emitted, failed, >>> ...) >>> * Autocommit mode >>> >>> Performance. >>> * Run performance benchmarks >>> >>> Integration Testing >>> * Test for exactly once in non failure scenarios (e.g. >>> activate/deactivate) >>> * Test for at least once in failure scenarios >>> * Test Trident guarantees >>> >>> Unit Testing >>> * Identify unit test coverage and find a modular way to continually add >>> new tests >>> >>> Trident >>> * Pull request<https://github.com/apache/storm/pull/2174> for review >>> >>> API >>> * Investigate for gaps in API between storm-kafka and >>> storm-kafka-client. >>> * Can we discontinue the old API ? >>> >>> Documentation >>> * Check for accuracy and completeness of documentation >>> * Make clean code snippets with examples available >>> >>> [1] - The data was extracted from JIRA on 07/11/2017. The >>> storm-kafka-client JIRAs were checked for correctness of component label, >>> and had their status updated. None of that was done for the storm-kafka >>> JIRAs, therefore some of its issues marked as open may already have been >>> fixed. The results and charts can be found here: >>> * >>> storm-kafka-jiras<https://docs.google.com/spreadsheets/d/ >> 1pdqAKDtqfhPrfgFxnQa4bSrKP1YBdMyuGzqr3gLzcMA/edit?usp=sharing> >>> * >>> storm-kafka-client-jiras<https://docs.google.com/spreadsheets/d/ >> 12g0HLz4pgODMVVOmzvti1nzLOa6iygmk8pyTOv8op1c/edit?usp=sharing> >>
