Jeremiah,

Totally understand now. We can certainly add a property that indicates whether 
or not to commit the offsets.
We should probably also document (at a very high level) the use-case that you 
are describing as an example
of why you may want to not commit the offsets. I will update the ticket to 
include this.

Regarding the separate enhancement: when you say "the last written offset" are 
you referring to when GetKafka
writes the offset to ZooKeeper? I do not believe that information is exposed by 
their "High-level consumer."
It's probably possible if we were to change to the "simple consumer" API, but 
that interface is extremely different
so it unfortunately isn't a simple change.

The FlowFiles that are received, though, do have a "kafka.offset" attribute, 
which indicates the offset of that individual
message, if that helps?

Thanks
-Mark


----------------------------------------
> Date: Tue, 28 Jul 2015 08:56:21 -0600
> Subject: Re: GetKafka Processor and Hardcoded Kafka Consumer Configs
> From: [email protected]
> To: [email protected]
>
> In the case of auto.commit.enable - we had a scenario during our last
> deploy in which we did not commit the offsets we read at all. This
> atypical. This is in the case of a Lambda-like architecture in which we use
> S3 to provide historical data to repopulate the near real-time datastore
> during a deploy.
>
> Mostly, I think that the user experience would be better if we had complete
> control over the GetKafka Processor config here:
> http://kafka.apache.org/documentation.html#consumerconfigs.
> There may be implementation details that make it impossible, but it would
> be the best case. I think it is probably safe to say the same about the
> Kafka Producer - but I have not run into any blockers as-is. I have added
> this to the jira ticket.
>
> Also, a separate enhancement:
>
> I see a need to pass along the last written offset to subsequent Processors
> in a flow. I don't know if this is even possible, I didn't look that
> closely at the code. It could be useful If it were possible to have the
> option to pass the last Offset along the flow as metadata. We could then
> pass around FlowFile data indexed by last Offset. Dunno if this is worth
> exploring as it may be unique to our architecture.
>
>
> *Jeremiah Adams*
>
> Senior Software Developer
> Pearson
>
> 2154 East Commons Ave.
> Suite 400
> Centennial, CO 80122
>
>
> Always Learning
> Learn more at www.pearson.com
>
> On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <[email protected]> wrote:
>
>> Jeremiah,
>>
>> We can certainly enable the "auto.offset.reset" to be configurable. Not
>> sure how making the "auto.commit.enable" configurable would work.
>> Are you thinking that another property would be added to indicate how
>> often to commit? Or would it work completely differently? Just need that
>> fleshed out a bit more.
>>
>> I do like the suggestion of exposing the config properties as user-defined
>> properties.
>>
>> I have created a ticket to track this information:
>> https://issues.apache.org/jira/browse/NIFI-791
>>
>> Please feel free to update the ticket with any relevant information as you
>> think of it.
>>
>> Thanks!
>> -Mark
>>
>> ----------------------------------------
>>> Date: Mon, 27 Jul 2015 15:42:37 -0600
>>> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
>>> From: [email protected]
>>> To: [email protected]
>>>
>>> The GetKafka processor has a couple of Kafka Consumer Config values that
>>> are hard-coded.
>>>
>>> props.setProperty("auto.commit.enable", "true"); // just be explicit
>>> props.setProperty("auto.offset.reset", "smallest");
>>>
>>> These should be configurable property values in the Processor. Most
>>> notable for me is the "auto.offset.reset". Smallest vs. Largest has some
>>> implications concerning fault tolerance strategies.
>>>
>>> It would be best to expose all of the available Kafka Consumer Config
>>> properties. If these change though between kafka versions it would create
>>> maintenance work for the Processors.
>>>
>>> Another option would be to allow ad-hoc property values and end-user just
>>> supply the kafka config values they want to override.
>>>
>>>
>>> *Jeremiah Adams*
>>>
>>> Senior Software Developer
>>> Pearson
>>>
>>> 2154 East Commons Ave.
>>> Suite 400
>>> Centennial, CO 80122
>>>
>>>
>>> Always Learning
>>> Learn more at www.pearson.com
>>
                                          

Reply via email to