In the case of auto.commit.enable - we had a scenario during our last deploy in which we did not commit the offsets we read at all. This atypical. This is in the case of a Lambda-like architecture in which we use S3 to provide historical data to repopulate the near real-time datastore during a deploy.
Mostly, I think that the user experience would be better if we had complete control over the GetKafka Processor config here: http://kafka.apache.org/documentation.html#consumerconfigs. There may be implementation details that make it impossible, but it would be the best case. I think it is probably safe to say the same about the Kafka Producer - but I have not run into any blockers as-is. I have added this to the jira ticket. Also, a separate enhancement: I see a need to pass along the last written offset to subsequent Processors in a flow. I don't know if this is even possible, I didn't look that closely at the code. It could be useful If it were possible to have the option to pass the last Offset along the flow as metadata. We could then pass around FlowFile data indexed by last Offset. Dunno if this is worth exploring as it may be unique to our architecture. *Jeremiah Adams* Senior Software Developer Pearson 2154 East Commons Ave. Suite 400 Centennial, CO 80122 Always Learning Learn more at www.pearson.com On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <[email protected]> wrote: > Jeremiah, > > We can certainly enable the "auto.offset.reset" to be configurable. Not > sure how making the "auto.commit.enable" configurable would work. > Are you thinking that another property would be added to indicate how > often to commit? Or would it work completely differently? Just need that > fleshed out a bit more. > > I do like the suggestion of exposing the config properties as user-defined > properties. > > I have created a ticket to track this information: > https://issues.apache.org/jira/browse/NIFI-791 > > Please feel free to update the ticket with any relevant information as you > think of it. > > Thanks! > -Mark > > ---------------------------------------- > > Date: Mon, 27 Jul 2015 15:42:37 -0600 > > Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs > > From: [email protected] > > To: [email protected] > > > > The GetKafka processor has a couple of Kafka Consumer Config values that > > are hard-coded. > > > > props.setProperty("auto.commit.enable", "true"); // just be explicit > > props.setProperty("auto.offset.reset", "smallest"); > > > > These should be configurable property values in the Processor. Most > > notable for me is the "auto.offset.reset". Smallest vs. Largest has some > > implications concerning fault tolerance strategies. > > > > It would be best to expose all of the available Kafka Consumer Config > > properties. If these change though between kafka versions it would create > > maintenance work for the Processors. > > > > Another option would be to allow ad-hoc property values and end-user just > > supply the kafka config values they want to override. > > > > > > *Jeremiah Adams* > > > > Senior Software Developer > > Pearson > > > > 2154 East Commons Ave. > > Suite 400 > > Centennial, CO 80122 > > > > > > Always Learning > > Learn more at www.pearson.com >
