Dynamic assignment is what causing all the issues that we see now. 1. Duplicates at the start of the KafkaSpout and upon any rebalance 2. Trident Kafka Spout not holding the transactional batches. Many corner cases can easily produce duplicates.
There is no point in keeping dynamic assignment given all the issues that are showing up. Here is the excerpt from Kafka consumer docs https://www-us.apache.org/dist/kafka/0.10.0.1/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html "If the process itself is highly available and will be restarted if it fails (perhaps using a cluster management framework like YARN, Mesos, or AWS facilities, or as part of a stream processing framework). In this case there is no need for Kafka to detect the failure and reassign the partition since the consuming process will be restarted on another machine." Manual assignment is the right way to go. -Harsha On Jun 9, 2017, 4:09 PM -0700, Hugo Da Cruz Louro <hlo...@hortonworks.com>, wrote: +1 for simplifying KafkaSpoutConfig. Too many constructors and too many methods.. I am not sure it’s justifiable to have any methods that simply set KafkaConsumer properties. All of these properties should just go in a Map<String, Object>, which is what KafkaConsumer receives, and what was supported in the initial implementation. The names of the properties can be retrieved from org.apache.kafka.clients.consumer.ConsumerConfig. At this point we may have to keep in mind backwards compatibility. Not sure we should completely discontinue dynamic partition assignment, as it is one of primary features of the new Storm Kafka Client API. With this said, manual partition assignment should be supported and would solve a lot of potential problems arising from dynamic partition assignment. Hugo On Jun 9, 2017, at 3:33 PM, Harsha <st...@harsha.io> wrote: I think question why we need all those settings when a user can pass it via Properties with consumer properties defined or via Map conf object. Having the methods on top of consumer config means every time Kafka consumer property added or changed one needs add a builder method. We need to get out of the way and let the user configure it like they do it for typical Kafka Consumer instead we've 10s of methods that sets properties for ConsumerConfig. Examples: https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L317 https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L309 etc.. all of these are specific to KafkaConsumer config, users should be able to pass it via Properties all of these. https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L327 whats the benefit of adding that method? and we are forcing that to set the protocol to "SSL" in this method https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/spout/KafkaSpoutConfig.java#L318 Users can set the ssl properties and then can select the securityProtocol "SASL_SSL" which requires both kerberos and ssl configs to be set. In above case making a call setSSLTruststore changes the security.protocol to "SSL". This could easily run into issues if the users sets securityProtocol first with "SASL_SSL" then later calls setSSLTruststore which changes it to "SSL". We are over-taking these settings instead of letting user to figure out from Kafka consumer config page. In contrast we've KafkaProducer which does this https://github.com/apache/storm/blob/master/external/storm-kafka-client/src/main/java/org/apache/storm/kafka/bolt/KafkaBolt.java#L121 . I would add Properties object instead of deriving it from topologyConf but this is much more easier to understand for the users. The contract here is put whatever the producer configs that users wants in the conf object and we create producer out of that config. Honestly these interfaces needs to be simple and let the user have control instead of adding our interpretation. Thanks, Harsha On Jun 9, 2017, 2:08 PM -0700, Stig Døssing <generalbas....@gmail.com>, wrote: I'd be happy with a simpler KafkaSpoutConfig, but I think most of the configuration parameters have good reasons for being there. Any examples of parameters you think we should remove? 2017-06-09 21:34 GMT+02:00 Harsha <st...@harsha.io>: +1 on using the manual assignment for the reasons specified below. We will see duplicates even in stable conditions which is not good. I don’t see any reason not to switch to manual assignment. While we are at it we should refactor the KafkaConfig part. It should be as simple as accepting the kafka consumer config or properties file and forwarding it to KafkaConsumer. We made it overly complex and unnecessary. Thanks, Harsha