Jeremiah, Totally understand now. We can certainly add a property that indicates whether or not to commit the offsets. We should probably also document (at a very high level) the use-case that you are describing as an example of why you may want to not commit the offsets. I will update the ticket to include this.
Regarding the separate enhancement: when you say "the last written offset" are you referring to when GetKafka writes the offset to ZooKeeper? I do not believe that information is exposed by their "High-level consumer." It's probably possible if we were to change to the "simple consumer" API, but that interface is extremely different so it unfortunately isn't a simple change. The FlowFiles that are received, though, do have a "kafka.offset" attribute, which indicates the offset of that individual message, if that helps? Thanks -Mark ---------------------------------------- > Date: Tue, 28 Jul 2015 08:56:21 -0600 > Subject: Re: GetKafka Processor and Hardcoded Kafka Consumer Configs > From: [email protected] > To: [email protected] > > In the case of auto.commit.enable - we had a scenario during our last > deploy in which we did not commit the offsets we read at all. This > atypical. This is in the case of a Lambda-like architecture in which we use > S3 to provide historical data to repopulate the near real-time datastore > during a deploy. > > Mostly, I think that the user experience would be better if we had complete > control over the GetKafka Processor config here: > http://kafka.apache.org/documentation.html#consumerconfigs. > There may be implementation details that make it impossible, but it would > be the best case. I think it is probably safe to say the same about the > Kafka Producer - but I have not run into any blockers as-is. I have added > this to the jira ticket. > > Also, a separate enhancement: > > I see a need to pass along the last written offset to subsequent Processors > in a flow. I don't know if this is even possible, I didn't look that > closely at the code. It could be useful If it were possible to have the > option to pass the last Offset along the flow as metadata. We could then > pass around FlowFile data indexed by last Offset. Dunno if this is worth > exploring as it may be unique to our architecture. > > > *Jeremiah Adams* > > Senior Software Developer > Pearson > > 2154 East Commons Ave. > Suite 400 > Centennial, CO 80122 > > > Always Learning > Learn more at www.pearson.com > > On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <[email protected]> wrote: > >> Jeremiah, >> >> We can certainly enable the "auto.offset.reset" to be configurable. Not >> sure how making the "auto.commit.enable" configurable would work. >> Are you thinking that another property would be added to indicate how >> often to commit? Or would it work completely differently? Just need that >> fleshed out a bit more. >> >> I do like the suggestion of exposing the config properties as user-defined >> properties. >> >> I have created a ticket to track this information: >> https://issues.apache.org/jira/browse/NIFI-791 >> >> Please feel free to update the ticket with any relevant information as you >> think of it. >> >> Thanks! >> -Mark >> >> ---------------------------------------- >>> Date: Mon, 27 Jul 2015 15:42:37 -0600 >>> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs >>> From: [email protected] >>> To: [email protected] >>> >>> The GetKafka processor has a couple of Kafka Consumer Config values that >>> are hard-coded. >>> >>> props.setProperty("auto.commit.enable", "true"); // just be explicit >>> props.setProperty("auto.offset.reset", "smallest"); >>> >>> These should be configurable property values in the Processor. Most >>> notable for me is the "auto.offset.reset". Smallest vs. Largest has some >>> implications concerning fault tolerance strategies. >>> >>> It would be best to expose all of the available Kafka Consumer Config >>> properties. If these change though between kafka versions it would create >>> maintenance work for the Processors. >>> >>> Another option would be to allow ad-hoc property values and end-user just >>> supply the kafka config values they want to override. >>> >>> >>> *Jeremiah Adams* >>> >>> Senior Software Developer >>> Pearson >>> >>> 2154 East Commons Ave. >>> Suite 400 >>> Centennial, CO 80122 >>> >>> >>> Always Learning >>> Learn more at www.pearson.com >>
