[ 
https://issues.apache.org/jira/browse/KAFKA-9632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajini Sivaram updated KAFKA-9632:
----------------------------------
    Description: 
When running this test with _numRecordsPerProducer=500_, the test fails 
intermittently. The test uses MockTime and runs concurrent log operations. This 
can cause issues when attempting to roll a segment since Log and MockScheduler 
don't work well together. MockScheduler currently runs tasks while holding the 
MockScheduler lock. This can cause a deadlock if a thread attempts to schedule 
a task while holding a lock which is also acquired within a scheduled task.

The issue in this test occurs when these two operations happen concurrently:

1) LogManager.cleanupLogs is a scheduled task that acquires Log lock. When run 
with MockScheduler, the thread holds MockScheduler lock and then attempts to 
acquire Log lock.

2) Partition.appendLogsToLeader holds Log lock and attempts to acquire 
MockScheduler lock in order to schedule a roll().

Since locking order is reversed in 1) and 2), this causes a deadlock.

The test itself can be easily fixed by avoiding roll() in the test. But it will 
be good to fix MockScheduler to enable it to be used in this case.

 

  was:
When running this test with {color:#660e7a}numRecordsPerProducer {color}= 
{color:#0000ff}500, {color:#172b4d}the test fails intermittently. The test uses 
MockTime and runs concurrent log operations. This can cause issues when 
attempting to roll a segment since Log and MockScheduler don't work well 
together. MockScheduler currently runs tasks while holding the MockScheduler 
lock. This can cause a deadlock if a thread attempts to schedule a task while 
holding a lock which is also acquired within a scheduled task.{color}
{color}

{color:#0000ff}{color:#172b4d}The issue in this test occurs when these two 
operations happen concurrently:{color}{color}

{color:#0000ff}{color:#172b4d}1) LogManager.cleanupLogs is a scheduled task 
that acquires Log lock. When run with MockScheduler, the thread holds 
MockScheduler lock and then attempts to acquire Log lock.{color}{color}

{color:#0000ff}{color:#172b4d}2) Partition.appendLogsToLeader holds Log lock 
and attempts to acquire MockScheduler lock in order to schedule a 
roll().{color}{color}

{color:#0000ff}{color:#172b4d}Since locking order is reversed in 1) and 2), 
this causes a deadlock.{color}{color}

{color:#0000ff}{color:#172b4d}The test itself can be easily fixed by avoiding 
roll() in the test. But it will be good to fix MockScheduler to enable it to be 
used in this case.{color}{color}

 


> Transient test failure: PartitionLockTest.testAppendReplicaFetchWithUpdateIsr
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-9632
>                 URL: https://issues.apache.org/jira/browse/KAFKA-9632
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.5.0
>            Reporter: Rajini Sivaram
>            Assignee: Rajini Sivaram
>            Priority: Major
>
> When running this test with _numRecordsPerProducer=500_, the test fails 
> intermittently. The test uses MockTime and runs concurrent log operations. 
> This can cause issues when attempting to roll a segment since Log and 
> MockScheduler don't work well together. MockScheduler currently runs tasks 
> while holding the MockScheduler lock. This can cause a deadlock if a thread 
> attempts to schedule a task while holding a lock which is also acquired 
> within a scheduled task.
> The issue in this test occurs when these two operations happen concurrently:
> 1) LogManager.cleanupLogs is a scheduled task that acquires Log lock. When 
> run with MockScheduler, the thread holds MockScheduler lock and then attempts 
> to acquire Log lock.
> 2) Partition.appendLogsToLeader holds Log lock and attempts to acquire 
> MockScheduler lock in order to schedule a roll().
> Since locking order is reversed in 1) and 2), this causes a deadlock.
> The test itself can be easily fixed by avoiding roll() in the test. But it 
> will be good to fix MockScheduler to enable it to be used in this case.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to