[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

Jay Kreps (JIRA) Fri, 13 Feb 2015 15:58:36 -0800

    [ 
https://issues.apache.org/jira/browse/STORM-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14321002#comment-14321002
 ]


Jay Kreps commented on STORM-650:
---------------------------------

Hey guys, from my pov the best thing to do would be to get on the new and 
improved Java APIs. We believe these should do what you want and should allow 
dramatically simplifying your code.

There are several incidental improvements you should get out of this: (1) no 
more scala dependency, (2) clients are in their own jar so no pulling in server 
deps, (3) richer api which should make reaching around to ZK unnecessary. These 
are intended to be the long term supported JVM apis.

The new producer is ready to go and is in the 0.8.2 release. It is protocol 
compatible with any 0.8.x release so depending on this doesn't require users to 
upgrade their Kafka installation. It's much much faster than the previous 
producer in general but especially when doing synchronous acknowledgements.

The new consumer is on trunk now and is beta quality. It does not yet include 
the regular expression support and it does not yet allow automatic partition 
balancing (that is pending server-side features). Both those are coming, though 
(but I don't think you guys do regex now and you do your own partition 
assignment anyway). The advantage of this new client for the consumer is it 
will give you full control over your offset, includes support for committing 
offsets, and gives you full control over partition assignment but doesn't 
require you to manually discover brokers and manage failover which is hyper 
error-prone. The timeline for a production-quality release is probably about 3 
months. 

The producer APIs are still changeable as this is pre-release, so if there is 
any gap in what you would need, now we be a fantastic time to flag it. You can 
see the new APIs here:
http://kafka.apache.org/083/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

If anyone would like to have a brief conversation about the APIs I'd be happy 
to do that sometime after next week.

> Storm-Kafka Refactoring and Improvements
> ----------------------------------------
>
>                 Key: STORM-650
>                 URL: https://issues.apache.org/jira/browse/STORM-650
>             Project: Apache Storm
>          Issue Type: Improvement
>          Components: storm-kafka
>            Reporter: P. Taylor Goetz
>
> This is intended to be a parent/umbrella JIRA covering a number of 
> efforts/suggestions aimed at improving the storm-kafka module. The goal is to 
> facilitate communication and collaboration by providing a central point for 
> discussion and coordination.
> The first phase should be to identify and agree upon a list of high-level 
> points we would like to address. Once that is complete, we can move on to 
> implementation/design discussions, followed by an implementation plan, 
> division of labor, etc.
> A non-exhaustive, initial list of items follows. New/additional thoughts can 
> be proposed in the comments.
> * Improve API for Specifying the Kafka Starting point
> Configuring the kafka spout's starting position (e.g. forceFromStart=true) is 
> a common source of confusion. This should be refactored to provide an easy to 
> understand, unambiguous API for configuring this property.
> * Use Kafka APIs Instead of Internal ZK Metadata (STORM-590)
> Currently the Kafka spout relies on reading Kafka's internal metadata from 
> zookeeper. This should be refactored to use the Kafka Consumer API to protect 
> against changes to the internal metadata format stored in ZK.
> * Improve Error Handling
> There are a number of failure scenarios with the kafka spout that users may 
> want to react to differently based on their use case. Add a failure handler 
> API that allows users to implement and/or plug in alternative failure 
> handling implementations. It is assumed that default (sane) implementations 
> would be included and configured by default.
> * Configuration/General Refactoring (BrokerHosts, etc.) (STORM-631)
> (need to flesh this out better) Reduce unnecessary marker 
> interfaces/"instance of" checks. Unify configuration of core storm/trident 
> spout implementations.
> Discussion Items:
> * How important is backward compatibility?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (STORM-650) Storm-Kafka Refactoring and Improvements

Reply via email to