I missed the kafka.offset attribute. That is what I needed. Thanks Mark. *Jeremiah Adams*
Senior Software Developer Pearson 2154 East Commons Ave. Suite 400 Centennial, CO 80122 Always Learning Learn more at www.pearson.com On Tue, Jul 28, 2015 at 9:28 AM, Mark Payne <[email protected]> wrote: > Jeremiah, > > Totally understand now. We can certainly add a property that indicates > whether or not to commit the offsets. > We should probably also document (at a very high level) the use-case that > you are describing as an example > of why you may want to not commit the offsets. I will update the ticket to > include this. > > Regarding the separate enhancement: when you say "the last written offset" > are you referring to when GetKafka > writes the offset to ZooKeeper? I do not believe that information is > exposed by their "High-level consumer." > It's probably possible if we were to change to the "simple consumer" API, > but that interface is extremely different > so it unfortunately isn't a simple change. > > The FlowFiles that are received, though, do have a "kafka.offset" > attribute, which indicates the offset of that individual > message, if that helps? > > Thanks > -Mark > > > ---------------------------------------- > > Date: Tue, 28 Jul 2015 08:56:21 -0600 > > Subject: Re: GetKafka Processor and Hardcoded Kafka Consumer Configs > > From: [email protected] > > To: [email protected] > > > > In the case of auto.commit.enable - we had a scenario during our last > > deploy in which we did not commit the offsets we read at all. This > > atypical. This is in the case of a Lambda-like architecture in which we > use > > S3 to provide historical data to repopulate the near real-time datastore > > during a deploy. > > > > Mostly, I think that the user experience would be better if we had > complete > > control over the GetKafka Processor config here: > > http://kafka.apache.org/documentation.html#consumerconfigs. > > There may be implementation details that make it impossible, but it would > > be the best case. I think it is probably safe to say the same about the > > Kafka Producer - but I have not run into any blockers as-is. I have added > > this to the jira ticket. > > > > Also, a separate enhancement: > > > > I see a need to pass along the last written offset to subsequent > Processors > > in a flow. I don't know if this is even possible, I didn't look that > > closely at the code. It could be useful If it were possible to have the > > option to pass the last Offset along the flow as metadata. We could then > > pass around FlowFile data indexed by last Offset. Dunno if this is worth > > exploring as it may be unique to our architecture. > > > > > > *Jeremiah Adams* > > > > Senior Software Developer > > Pearson > > > > 2154 East Commons Ave. > > Suite 400 > > Centennial, CO 80122 > > > > > > Always Learning > > Learn more at www.pearson.com > > > > On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <[email protected]> > wrote: > > > >> Jeremiah, > >> > >> We can certainly enable the "auto.offset.reset" to be configurable. Not > >> sure how making the "auto.commit.enable" configurable would work. > >> Are you thinking that another property would be added to indicate how > >> often to commit? Or would it work completely differently? Just need that > >> fleshed out a bit more. > >> > >> I do like the suggestion of exposing the config properties as > user-defined > >> properties. > >> > >> I have created a ticket to track this information: > >> https://issues.apache.org/jira/browse/NIFI-791 > >> > >> Please feel free to update the ticket with any relevant information as > you > >> think of it. > >> > >> Thanks! > >> -Mark > >> > >> ---------------------------------------- > >>> Date: Mon, 27 Jul 2015 15:42:37 -0600 > >>> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs > >>> From: [email protected] > >>> To: [email protected] > >>> > >>> The GetKafka processor has a couple of Kafka Consumer Config values > that > >>> are hard-coded. > >>> > >>> props.setProperty("auto.commit.enable", "true"); // just be explicit > >>> props.setProperty("auto.offset.reset", "smallest"); > >>> > >>> These should be configurable property values in the Processor. Most > >>> notable for me is the "auto.offset.reset". Smallest vs. Largest has > some > >>> implications concerning fault tolerance strategies. > >>> > >>> It would be best to expose all of the available Kafka Consumer Config > >>> properties. If these change though between kafka versions it would > create > >>> maintenance work for the Processors. > >>> > >>> Another option would be to allow ad-hoc property values and end-user > just > >>> supply the kafka config values they want to override. > >>> > >>> > >>> *Jeremiah Adams* > >>> > >>> Senior Software Developer > >>> Pearson > >>> > >>> 2154 East Commons Ave. > >>> Suite 400 > >>> Centennial, CO 80122 > >>> > >>> > >>> Always Learning > >>> Learn more at www.pearson.com > >> >
