+1 for removing storm-kafka from master, since we shouldn't encourage
people to use a component that won't work on new Kafka versions. As you
both mentioned, the 1.x version of storm-kafka should still be usable on a
2.0 cluster, so it will still be available in case people need it. A wiki
page for tracking current missing pieces for storm-kafka-client sounds good.

2017-07-19 19:09 GMT+02:00 Harsha <[email protected]>:

> +1 on moving away from storm-kafka for Storm 2.0. For existing users we
> can provide any critical bug fixes and provide it as part of 1.x
> releases. They can still use the existing 1.x storm-kafka against 2.0.
> Since kafka itself is moving away from older APIs continuing two
> versions of kafka connector doesnt’ make sense and honestly splits the
> usage which doesn’t give us any feedback on new storm-kafka-client.
> Thanks,
> Harsha
>
> On Wed, Jul 19, 2017, at 09:20 AM, Hugo Da Cruz Louro wrote:
> > Hi,
> >
> > The goal of this email is to summarize and unify the discussion started
> > across several email threads (Storm 2.0
> > Roadmap<http://search-hadoop.com/?project=Storm&q=%22%
> 5BDISCUSS%5D+Storm+2.0+Roadmap%22>,
> > 1.1.1 Release
> > Planning<http://search-hadoop.com/m/Storm/8gnYyGagLDWv1qG?
> subj=Release+Planning+for+1+1+1+and+others+>,
> > Lag
> > Issues<http://search-hadoop.com/m/Storm/8gnYyLmjIjYr692?
> subj=Lag+issues+using+Storm+1+1+1+latest+build+with+
> StormKafkaClient+1+1+1+vs+old+StormKafka+spouts>)
> > concerning the maintenance, branch support, and eventual deprecation of
> > storm-kafka and storm-kafka-client.
> >
> > It was proposed in an earlier
> > discussion<http://search-hadoop.com/?project=Storm&q=%
> 22%5BDISCUSS%5D+Storm+2.0+Roadmap%22>
> > the plan to deprecate storm-kafka in prol of storm-kafka-client. To
> > clarify, the idea is not to completely eliminate storm-kafka, but rather
> > keep supporting it in the 1.x-branch, while removing it from master (i.e.
> > Storm 2.0 onwards). That is, storm-kafka-client will then become the only
> > Storm Kafka option available for Storm 2.0 onwards, given that we have
> > enough confidence in its stability by the time of the Storm 2.0 release.
> >
> > The main reason for this proposal is the fact that the Kafka community
> > agreed<https://cwiki.apache.org/confluence/display/KAFKA/
> KIP-109:+Old+Consumer+Deprecation>
> > to deprecate the old consumer APIs starting in version 0.10.2, and will
> > remove them in the next major version (0.12). This implies that
> > storm-kafka will not work for Kafka 0.12 onwards. Important features
> > missing in the old Kafka consumer are: security, new message format, and
> > fetching offsets based on time stamp (KIP-79).
> >
> > In earlier discussions the Storm community has shown concerns about the
> > performance and stability of the storm-kafka-client. Those concerns are
> > valid and were mirrored by the Kafka community in their early deprecation
> > discussions. I align with what was said in the Kafka
> > discussion<http://search-hadoop.com/m/Kafka/uyzND1e4bUP1Rjq721>: the
> > storm-kafka-client has bugs, but so does storm-kafka, and all the
> > development is currently going into storm-kafka-client, which will be
> > even more prevalent in face of Kafka discontinuing the old consumer
> > API’s. The only way to stabilize a complex component such as
> > storm-kafka-client is to test it extensively in all its variants, which
> > inevitably comes from users using it. Furthermore, removing storm-kafka
> > from Storm 2.0 does not prevent users from still referring to storm-kafka
> > version 1.x in their topologies.
> >
> > I did a quick analysis of the JIRA issues for storm-kafka and
> > storm-kafka-client [1].  As of July 11 there are 22 open or in-progress
> > bugs for storm-kafka (1 blocker) and 15 for storm-kafka-client.
> >
> > The recent refactoring around manual partition assignment should solve a
> > lot of edge case bugs that occurred during rebalance. There are also a
> > few open pull requests for Trident  and fixing some internal state
> > details such as maxUncommittedOffsets, topic compaction, etc.
> > Nevertheless, there are several areas that need to be addressed to
> > stabilize and improve storm-kafka-client. Similarly to what was done for
> > Storm SQL I suggest that we create a wiki page where we can centralize
> > some points of action such as:
> >
> > Features / Stability
> >  * Memory Footprint
> >  * Retrial Mechanism
> >  * Exactly once and at least once guarantees
> >  * Kafka Lag
> >  * Metrics
> >  * Spout Internals (e.g. maxUncommittedOffsets, ack, emitted, failed,
> >  ...)
> >  * Autocommit mode
> >
> > Performance.
> >  * Run performance benchmarks
> >
> > Integration Testing
> > * Test for exactly once in non failure scenarios (e.g.
> > activate/deactivate)
> > * Test for at least once in failure scenarios
> > * Test Trident guarantees
> >
> > Unit Testing
> >  * Identify unit test coverage and find a modular way to continually add
> >  new tests
> >
> > Trident
> >   * Pull request<https://github.com/apache/storm/pull/2174> for review
> >
> > API
> >   * Investigate for gaps in API between storm-kafka and
> >   storm-kafka-client.
> >   * Can we discontinue the old API ?
> >
> > Documentation
> >   * Check for accuracy and completeness of documentation
> >   * Make clean code snippets with examples available
> >
> > [1] - The data was extracted from JIRA on 07/11/2017. The
> > storm-kafka-client JIRAs were checked for correctness of component label,
> > and had their status updated. None of that was done for the storm-kafka
> > JIRAs, therefore some of its issues marked as open may already have been
> > fixed. The results and charts can be found here:
> >     *
> >     storm-kafka-jiras<https://docs.google.com/spreadsheets/d/
> 1pdqAKDtqfhPrfgFxnQa4bSrKP1YBdMyuGzqr3gLzcMA/edit?usp=sharing>
> >     *
> >     storm-kafka-client-jiras<https://docs.google.com/spreadsheets/d/
> 12g0HLz4pgODMVVOmzvti1nzLOa6iygmk8pyTOv8op1c/edit?usp=sharing>
>

Reply via email to