[ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395777#comment-14395777
 ] 

Jay Kreps commented on STORM-650:
---------------------------------

Hey [~wurstmeister] and [~ptgoetz] we are going to release that Kafka consumer 
API in the next release. I believe it should cover the needs you guys have and 
should dramatically simplify the kafka-storm code--it internally handles server 
failure, partition migration, offset storage, etc, but gives you full control 
over partition assignment and offset commit points which are the needs of a 
stream processing system. This should also remove all unnecessary threading 
from your code too as the consumer is fully non-blocking. Prior to the Kafka 
release it would be great if someone who knows the storm-kafka integration well 
could do a deep dive and just validate that these apis would indeed cover your 
needs and also validate that it would really significantly simplify your life. 
We think both should be true, but it would be good to check so we can make 
changes if needed. Once it is released we will have to break compatibility with 
each change so flushing these things out now is just much easier.

I'd be happy to jump on a quick call to discuss if that is useful.

> Storm-Kafka Refactoring and Improvements
> ----------------------------------------
>
>                 Key: STORM-650
>                 URL: https://issues.apache.org/jira/browse/STORM-650
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka
>            Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> * Kafka Spout doesn't pick up from the beginning of the queue unless 
> forceFromStart specified (STORM-563)
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to