Hi Stig, I think KafkaSpoutConfig constructor is private and it's throwing errors while using the approach that you mentioned. Making it public defeats be purpose of the builder. Can you give it a shot and confirm at your end if it's possible?
Thanks Priyank Sent from my iPhone > On Jun 13, 2017, at 9:36 AM, Stig Døssing <generalbas....@gmail.com> wrote: > > Hi Priyank, > > I changed my mind since those mails were sent. I don't think setKey/Value > are very useful. They couldn't be used with the default Kafka > deserializers, only for deserializers implemented by the user, and then > only if they were declared to implement SerializableDeserializer. I agree > that we should remove them, and I'm not going to undo anything currently in > the PR (unless there are objections on the PR of course) > > With regard to getting rid of the builder pattern, I think it is a pretty > nice pattern for Java. It looks to me like it should be possible to declare > and configure the builder with "component:", and then pass it to the > KafkaSpoutConfig constructor with "ref:" after (which lets you avoid > calling build()). Doesn't this work? > > 2017-06-12 23:32 GMT+02:00 Priyank Shah <ps...@hortonworks.com>: > >> Hi Stig, >> >> I think PR https://github.com/apache/storm/pull/2155/files you created >> gets rid of setKey and setValue. I am fine with it and in fact that’s what >> I was suggesting in first place. However, your last two email replies >> suggest something else. Just making sure you are not going to undo anything >> in the PR and that we are same page about this change. i.e. no setKey or >> setValue. Either for SerializableDeserializer implementations or Kafka >> Deserializer interface. Only string in fqcn as a property. >> >> The other thing I propose is to get rid of the builder class. Reason is >> constructing an object with builder pattern requires builder.build and that >> does work well with flux yaml. I think we should be careful about >> implementing new connectors and make sure they work with yaml as well. I >> have commented on the PR as well. Unless, someone else has a different >> opinion can you address that as well? >> >> On 6/10/17, 2:05 AM, "Stig Døssing" <generalbas....@gmail.com> wrote: >> >> Priyank, I was a bit too hasty in the last response. The setKey/Value >> functions are necessary when users want to set only the key or the >> value >> deserializer. I think we should keep those. It may be possible to >> deduplicate the functionality on the API by removing the Builder >> constructors that takes deserializers, and by getting rid of the >> setKey/Value functions that take Class instances, since those seem >> like a >> duplication of the consumer config functionality. This should get rid >> of a >> lot of the overloads. >> >> 2017-06-10 10:20 GMT+02:00 Stig Døssing <generalbas....@gmail.com>: >> >>> Harsha, >>> >>> +1 for simplifying away those methods that are just setting consumer >>> config. The properties I think we should keep are all the spout >>> configuration (timeouts, retry handling, tuple construction). Maybe >> we >>> deprecate the consumer config functions on 1.x and remove them on >> master? >>> >>> Priyank, >>> >>> When the spout is declared, it takes type parameters to define the >> key and >>> value types of the consumer. We are able to check at compile time >> that the >>> deserializers match those expected types. >>> e.g. >>> SerializableDeserializer<Integer> des = ... >>> >>> KafkaSpoutConfig<Integer, String> config = KafkaSpoutConfig.builder(" >> dummy", >>> "dummy") >>> .setKey(des) >>> .build(); >>> >>> KafkaSpout<String, String> wrongTypeSpout = new KafkaSpout<>(config); >>> >>> will not compile, while >>> >>> SerializableDeserializer<String> des = ... >>> >>> KafkaSpoutConfig<String, String> config = KafkaSpoutConfig.builder(" >> dummy", >>> "dummy") >>> .setKey(des) >>> .build(); >>> >>> KafkaSpout<String, String> spout = new KafkaSpout<>(config); >>> >>> will. If we want to simplify the API, maybe we should just mirror the >>> KafkaConsumer API more closely and remove the Builder setKey/Value >> methods. >>> I can't think of a reason why a user should need to create a Builder >> of one >>> type, and then change the type later with setKey/Value. The >> deserializers >>> can just go in through the Builder constructor. >>> >>> About KafkaTuple, I think it was done this way originally since >> requiring >>> users to subclass KafkaTuple would be a breaking change. If we want >> to do >>> it it should go in 2.x only. I'm not necessarily opposed to doing >> it, but I >>> don't really have a strong opinion on it. >>> >>> Hugo, >>> >>> I appreciate that the subscribe API is a major new feature of the 0.9 >>> consumer, but I can't come up with any reason to use it in Storm. I >> don't >>> think we should support it just because it is there. As mentioned >> upthread, >>> the features offered by that API are already covered by Storm, so >> I'm not >>> seeing the value to having it. If we can't come up with a use case >> for it I >>> don't see a reason to allow users to configure it, especially given >> the >>> non-obvious problems users who choose it are likely to run into. >>> >>> >>> 2017-06-10 <20%2017%2006%2010> 6:03 GMT+02:00 Harsha < >> st...@harsha.io>: >>> >>>> Dynamic assignment is what causing all the issues that we see now. >>>> 1. Duplicates at the start of the KafkaSpout and upon any rebalance >>>> 2. Trident Kafka Spout not holding the transactional batches. >>>> Many corner cases can easily produce duplicates. >>>> >>>> There is no point in keeping dynamic assignment given all the issues >>>> that are showing up. >>>> Here is the excerpt from Kafka consumer docs >>>> https://www-us.apache.org/dist/kafka/0.10.0.1/javadoc/org/ >>>> apache/kafka/clients/consumer/KafkaConsumer.html >>>> "If the process itself is highly available and will be restarted if >> it >>>> fails (perhaps using a cluster management framework like YARN, >> Mesos, or >>>> AWS facilities, or as part of a stream processing framework). In >> this >>>> case there is no need for Kafka to detect the failure and reassign >> the >>>> partition since the consuming process will be restarted on another >>>> machine." >>>> >>>> Manual assignment is the right way to go. >>>> >>>> -Harsha >>>> >>>> On Jun 9, 2017, 4:09 PM -0700, Hugo Da Cruz Louro >>>> <hlo...@hortonworks.com>, wrote: >>>> +1 for simplifying KafkaSpoutConfig. Too many constructors and too >> many >>>> methods.. I am not sure it’s justifiable to have any methods that >> simply >>>> set KafkaConsumer properties. All of these properties should just >> go in >>>> a Map<String, Object>, which is what KafkaConsumer receives, and >> what >>>> was supported in the initial implementation. The names of the >> properties >>>> can be retrieved from org.apache.kafka.clients. >> consumer.ConsumerConfig. >>>> At this point we may have to keep in mind backwards compatibility. >>>> >>>> Not sure we should completely discontinue dynamic partition >> assignment, >>>> as it is one of primary features of the new Storm Kafka Client API. >> With >>>> this said, manual partition assignment should be supported and would >>>> solve a lot of potential problems arising from dynamic partition >>>> assignment. >>>> >>>> Hugo >>>> >>>> On Jun 9, 2017, at 3:33 PM, Harsha <st...@harsha.io> wrote: >>>> >>>> I think question why we need all those settings when a user can >> pass it >>>> via Properties with consumer properties defined or via Map conf >> object. >>>> Having the methods on top of consumer config means every time Kafka >>>> consumer property added or changed one needs add a builder method. >> We >>>> need to get out of the way and let the user configure it like they >> do it >>>> for typical Kafka Consumer instead we've 10s of methods that sets >>>> properties for ConsumerConfig. >>>> >>>> Examples: >>>> https://github.com/apache/storm/blob/master/external/storm- >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ >>>> KafkaSpoutConfig.java#L317 >>>> >>>> https://github.com/apache/storm/blob/master/external/storm- >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ >>>> KafkaSpoutConfig.java#L309 >>>> etc.. all of these are specific to KafkaConsumer config, users >> should >>>> be able to pass it via Properties all of these. >>>> >>>> https://github.com/apache/storm/blob/master/external/storm- >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ >>>> KafkaSpoutConfig.java#L327 >>>> >>>> whats the benefit of adding that method? and we are forcing that to >> set >>>> the protocol to "SSL" in this method >>>> https://github.com/apache/storm/blob/master/external/storm- >>>> kafka-client/src/main/java/org/apache/storm/kafka/spout/ >>>> KafkaSpoutConfig.java#L318 >>>> >>>> Users can set the ssl properties and then can select the >>>> securityProtocol "SASL_SSL" which requires both kerberos and ssl >> configs >>>> to be set. In above case making a call setSSLTruststore changes the >>>> security.protocol to "SSL". This could easily run into issues if the >>>> users sets securityProtocol first with "SASL_SSL" then later calls >>>> setSSLTruststore which changes it to "SSL". >>>> >>>> We are over-taking these settings instead of letting user to figure >> out >>>> from Kafka consumer config page. >>>> >>>> In contrast we've KafkaProducer which does this >>>> https://github.com/apache/storm/blob/master/external/storm- >>>> kafka-client/src/main/java/org/apache/storm/kafka/bolt/ >>>> KafkaBolt.java#L121 >>>> . I would add Properties object instead of deriving it from >> topologyConf >>>> but this is much more easier to understand for the users. The >> contract >>>> here is put whatever the producer configs that users wants in the >> conf >>>> object and we create producer out of that config. >>>> >>>> Honestly these interfaces needs to be simple and let the user have >>>> control instead of adding our interpretation. >>>> >>>> >>>> >>>> Thanks, >>>> Harsha >>>> On Jun 9, 2017, 2:08 PM -0700, Stig Døssing < >> generalbas....@gmail.com>, >>>> wrote: >>>> I'd be happy with a simpler KafkaSpoutConfig, but I think most of >> the >>>> configuration parameters have good reasons for being there. Any >> examples >>>> of >>>> parameters you think we should remove? >>>> >>>> 2017-06-09 21:34 GMT+02:00 Harsha <st...@harsha.io>: >>>> >>>> +1 on using the manual assignment for the reasons specified below. >> We >>>> will see duplicates even in stable conditions which >>>> is not good. I don’t see any reason not to switch to manual >> assignment. >>>> While we are at it we should refactor the KafkaConfig part. >>>> It should be as simple as accepting the kafka consumer config or >>>> properties file and forwarding it to KafkaConsumer. We made >>>> it overly complex and unnecessary. >>>> >>>> Thanks, >>>> Harsha >>>> >>> >>> >> >> >>