Re: [DISCUSS] [Storm-Kafka] - Maintenance, Branch Support, and Deprecation Plans for storm-kafka and storm-kaka-client

P. Taylor Goetz Wed, 19 Jul 2017 13:17:31 -0700

+1 I’m fine with taking this approach.

-Taylor


> On Jul 19, 2017, at 2:04 PM, Stig Rohde Døssing <[email protected]> 
> wrote:
> 
> +1 for removing storm-kafka from master, since we shouldn't encourage
> people to use a component that won't work on new Kafka versions. As you
> both mentioned, the 1.x version of storm-kafka should still be usable on a
> 2.0 cluster, so it will still be available in case people need it. A wiki
> page for tracking current missing pieces for storm-kafka-client sounds good.
> 
> 2017-07-19 19:09 GMT+02:00 Harsha <[email protected]>:
> 
>> +1 on moving away from storm-kafka for Storm 2.0. For existing users we
>> can provide any critical bug fixes and provide it as part of 1.x
>> releases. They can still use the existing 1.x storm-kafka against 2.0.
>> Since kafka itself is moving away from older APIs continuing two
>> versions of kafka connector doesnt’ make sense and honestly splits the
>> usage which doesn’t give us any feedback on new storm-kafka-client.
>> Thanks,
>> Harsha
>> 
>> On Wed, Jul 19, 2017, at 09:20 AM, Hugo Da Cruz Louro wrote:
>>> Hi,
>>> 
>>> The goal of this email is to summarize and unify the discussion started
>>> across several email threads (Storm 2.0
>>> Roadmap<http://search-hadoop.com/?project=Storm&q=%22%
>> 5BDISCUSS%5D+Storm+2.0+Roadmap%22>,
>>> 1.1.1 Release
>>> Planning<http://search-hadoop.com/m/Storm/8gnYyGagLDWv1qG?
>> subj=Release+Planning+for+1+1+1+and+others+>,
>>> Lag
>>> Issues<http://search-hadoop.com/m/Storm/8gnYyLmjIjYr692?
>> subj=Lag+issues+using+Storm+1+1+1+latest+build+with+
>> StormKafkaClient+1+1+1+vs+old+StormKafka+spouts>)
>>> concerning the maintenance, branch support, and eventual deprecation of
>>> storm-kafka and storm-kafka-client.
>>> 
>>> It was proposed in an earlier
>>> discussion<http://search-hadoop.com/?project=Storm&q=%
>> 22%5BDISCUSS%5D+Storm+2.0+Roadmap%22>
>>> the plan to deprecate storm-kafka in prol of storm-kafka-client. To
>>> clarify, the idea is not to completely eliminate storm-kafka, but rather
>>> keep supporting it in the 1.x-branch, while removing it from master (i.e.
>>> Storm 2.0 onwards). That is, storm-kafka-client will then become the only
>>> Storm Kafka option available for Storm 2.0 onwards, given that we have
>>> enough confidence in its stability by the time of the Storm 2.0 release.
>>> 
>>> The main reason for this proposal is the fact that the Kafka community
>>> agreed<https://cwiki.apache.org/confluence/display/KAFKA/
>> KIP-109:+Old+Consumer+Deprecation>
>>> to deprecate the old consumer APIs starting in version 0.10.2, and will
>>> remove them in the next major version (0.12). This implies that
>>> storm-kafka will not work for Kafka 0.12 onwards. Important features
>>> missing in the old Kafka consumer are: security, new message format, and
>>> fetching offsets based on time stamp (KIP-79).
>>> 
>>> In earlier discussions the Storm community has shown concerns about the
>>> performance and stability of the storm-kafka-client. Those concerns are
>>> valid and were mirrored by the Kafka community in their early deprecation
>>> discussions. I align with what was said in the Kafka
>>> discussion<http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721>: the
>>> storm-kafka-client has bugs, but so does storm-kafka, and all the
>>> development is currently going into storm-kafka-client, which will be
>>> even more prevalent in face of Kafka discontinuing the old consumer
>>> API’s. The only way to stabilize a complex component such as
>>> storm-kafka-client is to test it extensively in all its variants, which
>>> inevitably comes from users using it. Furthermore, removing storm-kafka
>>> from Storm 2.0 does not prevent users from still referring to storm-kafka
>>> version 1.x in their topologies.
>>> 
>>> I did a quick analysis of the JIRA issues for storm-kafka and
>>> storm-kafka-client [1].  As of July 11 there are 22 open or in-progress
>>> bugs for storm-kafka (1 blocker) and 15 for storm-kafka-client.
>>> 
>>> The recent refactoring around manual partition assignment should solve a
>>> lot of edge case bugs that occurred during rebalance. There are also a
>>> few open pull requests for Trident  and fixing some internal state
>>> details such as maxUncommittedOffsets, topic compaction, etc.
>>> Nevertheless, there are several areas that need to be addressed to
>>> stabilize and improve storm-kafka-client. Similarly to what was done for
>>> Storm SQL I suggest that we create a wiki page where we can centralize
>>> some points of action such as:
>>> 
>>> Features / Stability
>>> * Memory Footprint
>>> * Retrial Mechanism
>>> * Exactly once and at least once guarantees
>>> * Kafka Lag
>>> * Metrics
>>> * Spout Internals (e.g. maxUncommittedOffsets, ack, emitted, failed,
>>> ...)
>>> * Autocommit mode
>>> 
>>> Performance.
>>> * Run performance benchmarks
>>> 
>>> Integration Testing
>>> * Test for exactly once in non failure scenarios (e.g.
>>> activate/deactivate)
>>> * Test for at least once in failure scenarios
>>> * Test Trident guarantees
>>> 
>>> Unit Testing
>>> * Identify unit test coverage and find a modular way to continually add
>>> new tests
>>> 
>>> Trident
>>>  * Pull request<https://github.com/apache/storm/pull/2174> for review
>>> 
>>> API
>>>  * Investigate for gaps in API between storm-kafka and
>>>  storm-kafka-client.
>>>  * Can we discontinue the old API ?
>>> 
>>> Documentation
>>>  * Check for accuracy and completeness of documentation
>>>  * Make clean code snippets with examples available
>>> 
>>> [1] - The data was extracted from JIRA on 07/11/2017. The
>>> storm-kafka-client JIRAs were checked for correctness of component label,
>>> and had their status updated. None of that was done for the storm-kafka
>>> JIRAs, therefore some of its issues marked as open may already have been
>>> fixed. The results and charts can be found here:
>>>    *
>>>    storm-kafka-jiras<https://docs.google.com/spreadsheets/d/
>> 1pdqAKDtqfhPrfgFxnQa4bSrKP1YBdMyuGzqr3gLzcMA/edit?usp=sharing>
>>>    *
>>>    storm-kafka-client-jiras<https://docs.google.com/spreadsheets/d/
>> 12g0HLz4pgODMVVOmzvti1nzLOa6iygmk8pyTOv8op1c/edit?usp=sharing>
>>

Re: [DISCUSS] [Storm-Kafka] - Maintenance, Branch Support, and Deprecation Plans for storm-kafka and storm-kaka-client

Reply via email to