[ 
https://issues.apache.org/jira/browse/KAFKA-7190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16572763#comment-16572763
 ] 

Jason Gustafson commented on KAFKA-7190:
----------------------------------------

This is a tough one. To guarantee transaction semantics, we need to retain 
producer state in the log. Without that state, our only options are to raise an 
error or weaken semantics. The problem with deleting beyond the LSO is that we 
may lose the producer state of an active transaction. As I understand it, the 
proposal here is to retain the state in memory even though we have lost it in 
the log, but in the worst case, we would still end up raising the 
UNKNOWN_PRODUCER error. The log is ultimately the source of truth for producer 
state. Doesn't it seem odd that a call to DeleteRecords can effectively kill a 
producer with an active transaction? What I'm wondering is whether deletion can 
be "soft" in the case that the offset is higher than the LSO. We can advance 
the log start offset to the new offset, but we can retain the data in the log 
until the LSO has reached the new log start offset. Then we could guarantee 
that the producer state of an active transaction is never lost.

This is useful because if a transactional produce request arrives and we have 
no producer state, then we know that it is either the start of a new 
transaction and safe to allow or it is a stale write from a fenced producer. 
The holy grail is being able to distinguish between these two cases. One option 
I was thinking about is letting each transaction start at sequence number 0. 
This would allow us to distinguish these two cases for all but the first record 
in a transaction. Leaving the one loose end is not satisfying, but technically 
it was already loose before. It is possible today for a producer to start a 
transaction and then become a zombie. If its transaction gets aborted by the 
coordinator and the state is lost due to a call to DeleteRecords, then the 
zombie can still wakeup and write to the partition. I'm not too sure how we'll 
fix this, but the point is we have to fix it anyway. 

> Under low traffic conditions purging repartition topics cause WARN statements 
> about  UNKNOWN_PRODUCER_ID 
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-7190
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7190
>             Project: Kafka
>          Issue Type: Improvement
>          Components: core, streams
>    Affects Versions: 1.1.0, 1.1.1
>            Reporter: Bill Bejeck
>            Assignee: lambdaliu
>            Priority: Major
>
> When a streams application has little traffic, then it is possible that 
> consumer purging would delete
> even the last message sent by a producer (i.e., all the messages sent by
> this producer have been consumed and committed), and as a result, the broker
> would delete that producer's ID. The next time when this producer tries to
> send, it will get this UNKNOWN_PRODUCER_ID error code, but in this case,
> this error is retriable: the producer would just get a new producer id and
> retries, and then this time it will succeed. 
>  
> Possible fixes could be on the broker side, i.e., delaying the deletion of 
> the produderIDs for a more extended period or on the streams side developing 
> a more conservative approach to deleting offsets from repartition topics
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to