Thanks for taking the time to go over the KIP Colin.

While I agree with both your points about error handling, I think this KIP 
focuses on just exposing these errors via the proposed metrics and does not 
alter the error handling behavior on either the brokers or the controllers. The 
metrics (as proposed) are somewhat independent of that and should just hook 
into whatever we are doing. If we change the error handling then the metrics 
should be hooked into the new code as well.

Maybe the point you are trying to make is that having the properties which you 
mention below will generally make it so that errors are not committed to the 
log and hence having the metrics for the is sufficient (as the error will 
likely be transient and recoverable?).

Apologies if I did not understand the intent of your comment.

- Niket

> On Aug 2, 2022, at 3:34 PM, Colin McCabe <cmcc...@apache.org> wrote:
> 
> Hi Niket,
> 
> Thanks for the KIP -- much appreciated! The new metrics look very useful.
> 
> I agree with the proposed error handling for errors on standby controllers 
> and brokers. For active controllers, I think we should establish two points:
> 
> 1. the active controller replays metadata before submitting it to the Raft 
> quorum
> 2. metadata replay errors on the active cause the process to exit, prior to 
> attempting to commit the record
> 
> This will allow most of these metadata replay errors to be noticed and NOT 
> committed to the metadata log, which I think will make things much more 
> robust. Since the controller process can be restarted very quickly, it 
> shouldn't be an undue operational burden. (It's true that when in combined 
> mode, restarts will take longer, but this kind of tradeoff is integral to 
> combined mode -- you get reduced fault isolation in exchange for the lower 
> overhead of one fewer JVM process).
> 
> best,
> Colin
> 
> 
> On Mon, Aug 1, 2022, at 18:05, David Arthur wrote:
>> Thanks, Niket.
>> 
>> +1 binding from me
>> 
>> -David
>> 
>> On Mon, Aug 1, 2022 at 8:15 PM Niket Goel <ng...@confluent.io.invalid> wrote:
>>> 
>>> Hi all,
>>> 
>>> I would like to start a vote on KIP-859 which adds some new metrics to 
>>> KRaft to allow for better visibility into log processing errors.
>>> 
>>> KIP 
>>> —ttps://cwiki.apache.org/confluence/display/KAFKA/KIP-859%3A+Add+Metadata+Log+Processing+Error+Related+Metrics
>>>  
>>> <https://www.google.com/url?q=https://cwiki.apache.org/confluence/display/KAFKA/KIP-859%253A%2BAdd%2BMetadata%2BLog%2BProcessing%2BError%2BRelated%2BMetrics&source=gmail-imap&ust=1660084494000000&usg=AOvVaw3enbkssNlSaDhPrm6faYby>
>>> Discussion Thread — 
>>> https://www.google.com/url?q=https://lists.apache.org/thread/yl87h1s484yc09yjo1no46hwpbv0qkwt&source=gmail-imap&ust=1660084494000000&usg=AOvVaw2R3Sj0u0NlQOG9XHh-Wgzs
>>> 
>>> Thanks
>>> Niket
>>> 

Reply via email to