[ 
https://issues.apache.org/jira/browse/KAFKA-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stanislav Kozlovski updated KAFKA-10301:
----------------------------------------
    Description: 
In Partition#updateAssignmentAndIsr, we would previously update the 
`partition#remoteReplicasMap` by adding the new replicas to the map and then 
removing the old ones 
([source]([https://github.com/apache/kafka/blob/7f9187fe399f3f6b041ca302bede2b3e780491e7/core/src/main/scala/kafka/cluster/Partition.scala#L657)]

During a recent refactoring, we changed it to first clear the map and then add 
all the replicas to it 
([source]([https://github.com/apache/kafka/blob/2.6/core/src/main/scala/kafka/cluster/Partition.scala#L663]))

While this is done in a write lock (`inWriteLock(leaderIsrUpdateLock)`), not 
all callers that access the map structure use a lock. Some examples:
 - Partition#updateFollowerFetchState
 - DelayedDeleteRecords#tryComplete
 - Partition#getReplicaOrException - called in `checkEnoughReplicasReachOffset` 
without a lock, which itself is called by DelayedProduce. I think this can fail 
a  `ReplicaManager#appendRecords` call.

While we want to polish the code to ensure these sort of race conditions become 
harder (or impossible) to introduce, it sounds safest to revert to the previous 
behavior given the timelines regarding the 2.6 release. Jira 
https://issues.apache.org/jira/browse/KAFKA-10302 tracks further modifications 
to the code.

  was:
In Partition#updateAssignmentAndIsr, we would previously update the 
`partition#remoteReplicasMap` by adding the new replicas to the map and then 
removing the old ones 
([source]([https://github.com/apache/kafka/blob/7f9187fe399f3f6b041ca302bede2b3e780491e7/core/src/main/scala/kafka/cluster/Partition.scala#L657)]

During a recent refactoring, we changed it to first clear the map and then add 
all the replicas to it 
([source]([https://github.com/apache/kafka/blob/2.6/core/src/main/scala/kafka/cluster/Partition.scala#L663]))

While this is done in a write lock (`inWriteLock(leaderIsrUpdateLock)`), not 
all callers that access the map structure use a lock. Some examples:
 - Partition#updateFollowerFetchState
 - DelayedDeleteRecords#tryComplete
 - Partition#getReplicaOrException - called in `checkEnoughReplicasReachOffset` 
without a lock, which itself is called by DelayedProduce. I think this can fail 
a  `ReplicaManager#appendRecords` call.

While we want to polish the code to ensure these sort of race conditions become 
harder (or impossible) to introduce, it sounds safest to revert to the previous 
behavior given the timelines regarding the 2.6 release. Jira 
https://issues.apache.org/jira/browse/KAFKA-10302 tracks that.


> Partition#remoteReplicasMap can be empty in certain race conditions
> -------------------------------------------------------------------
>
>                 Key: KAFKA-10301
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10301
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Stanislav Kozlovski
>            Assignee: Stanislav Kozlovski
>            Priority: Blocker
>
> In Partition#updateAssignmentAndIsr, we would previously update the 
> `partition#remoteReplicasMap` by adding the new replicas to the map and then 
> removing the old ones 
> ([source]([https://github.com/apache/kafka/blob/7f9187fe399f3f6b041ca302bede2b3e780491e7/core/src/main/scala/kafka/cluster/Partition.scala#L657)]
> During a recent refactoring, we changed it to first clear the map and then 
> add all the replicas to it 
> ([source]([https://github.com/apache/kafka/blob/2.6/core/src/main/scala/kafka/cluster/Partition.scala#L663]))
> While this is done in a write lock (`inWriteLock(leaderIsrUpdateLock)`), not 
> all callers that access the map structure use a lock. Some examples:
>  - Partition#updateFollowerFetchState
>  - DelayedDeleteRecords#tryComplete
>  - Partition#getReplicaOrException - called in 
> `checkEnoughReplicasReachOffset` without a lock, which itself is called by 
> DelayedProduce. I think this can fail a  `ReplicaManager#appendRecords` call.
> While we want to polish the code to ensure these sort of race conditions 
> become harder (or impossible) to introduce, it sounds safest to revert to the 
> previous behavior given the timelines regarding the 2.6 release. Jira 
> https://issues.apache.org/jira/browse/KAFKA-10302 tracks further 
> modifications to the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to