[ 
https://issues.apache.org/jira/browse/KAFKA-14920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17715227#comment-17715227
 ] 

Justine Olshan commented on KAFKA-14920:
----------------------------------------

I originally considered this, but we can't do it because we append the relevant 
information when the records are appended to the log. If we do it earlier, we 
break this logic. (Logic that is also persisted to disk and reloaded.) The 
information is store in the records themselves, and I don't know if there is a 
great way to handle this. How do we know when the record is actually appended 
vs verifying? Unless we add another in memory state?

This is why we added a state machine in kafka-14904.

> Address timeouts and out of order sequences
> -------------------------------------------
>
>                 Key: KAFKA-14920
>                 URL: https://issues.apache.org/jira/browse/KAFKA-14920
>             Project: Kafka
>          Issue Type: Sub-task
>            Reporter: Justine Olshan
>            Assignee: Justine Olshan
>            Priority: Blocker
>
> KAFKA-14844 showed the destructive nature of a timeout on the first produce 
> request for a topic partition (ie one that has no state in psm)
> Since we currently don't validate the first sequence (we will in part 2 of 
> kip-890), any transient error on the first produce can lead to out of order 
> sequences that never recover.
> Originally, KAFKA-14561 relied on the producer's retry mechanism for these 
> transient issues, but until that is fixed, we may need to retry from in the 
> AddPartitionsManager instead. We addressed the concurrent transactions, but 
> there are other errors like coordinator loading that we could run into and 
> see increased out of order issues.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to