In the case of auto.commit.enable - we had a scenario during our last
deploy in which we did not commit the offsets we read at all. This
atypical. This is in the case of a Lambda-like architecture in which we use
S3 to provide historical data to repopulate the near real-time datastore
during a deploy.

Mostly, I think that the user experience would be better if we had complete
control over the GetKafka Processor config  here:
http://kafka.apache.org/documentation.html#consumerconfigs.
There may be implementation details that make it impossible, but it would
be the best case. I think it is probably safe to say the same about the
Kafka Producer - but I have not run into any blockers as-is. I have added
this to the jira ticket.

Also, a separate enhancement:

I see a need to pass along the last written offset to subsequent Processors
in a flow. I don't know if this is even possible, I didn't look that
closely at the code. It could be useful If it were possible to have the
option to pass the last Offset along the flow as metadata. We could then
pass around FlowFile data indexed by last Offset. Dunno if this is worth
exploring as it may be unique to our architecture.


*Jeremiah Adams*

Senior Software Developer
Pearson

2154 East Commons Ave.
Suite 400
Centennial, CO 80122


Always Learning
Learn more at www.pearson.com

On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <[email protected]> wrote:

> Jeremiah,
>
> We can certainly enable the "auto.offset.reset" to be configurable. Not
> sure how making the "auto.commit.enable" configurable would work.
> Are you thinking that another property would be added to indicate how
> often to commit? Or would it work completely differently? Just need that
> fleshed out a bit more.
>
> I do like the suggestion of exposing the config properties as user-defined
> properties.
>
> I have created a ticket to track this information:
> https://issues.apache.org/jira/browse/NIFI-791
>
> Please feel free to update the ticket with any relevant information as you
> think of it.
>
> Thanks!
> -Mark
>
> ----------------------------------------
> > Date: Mon, 27 Jul 2015 15:42:37 -0600
> > Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
> > From: [email protected]
> > To: [email protected]
> >
> > The GetKafka processor has a couple of Kafka Consumer Config values that
> > are hard-coded.
> >
> > props.setProperty("auto.commit.enable", "true"); // just be explicit
> > props.setProperty("auto.offset.reset", "smallest");
> >
> > These should be configurable property values in the Processor. Most
> > notable for me is the "auto.offset.reset". Smallest vs. Largest has some
> > implications concerning fault tolerance strategies.
> >
> > It would be best to expose all of the available Kafka Consumer Config
> > properties. If these change though between kafka versions it would create
> > maintenance work for the Processors.
> >
> > Another option would be to allow ad-hoc property values and end-user just
> > supply the kafka config values they want to override.
> >
> >
> > *Jeremiah Adams*
> >
> > Senior Software Developer
> > Pearson
> >
> > 2154 East Commons Ave.
> > Suite 400
> > Centennial, CO 80122
> >
> >
> > Always Learning
> > Learn more at www.pearson.com
>

Reply via email to