[ 
https://issues.apache.org/jira/browse/HADOOP-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang updated HADOOP-12532:
-------------------------------------
    Description: 
I found a data race in ipc.Client.stop()

ipc.Client maintains a hash map of connection threads. When stop() is called, 
it interrupts all connection threads; the threads are supposed to remove itself 
from the hash map as part of the clean up work; and stop() periodically checks 
to see if the hash map is empty and then returns.

The bug is, this checking operation is not synchronized, and the connection 
thread actually removes itself from the hash map before terminating 
connections. 

This bug causes regression for HDFS-4925. In fact, the fix in HDFS-4925 may not 
be correct, because it assumes when it returns from 
QuorumJournalManager.close(), IPC client connection threads are terminated. But 
the reality is the IPC code assumes connections are closed, not the IPC 
connection threads (which in any case is buggy as well).

This is also likely related to the bug reported in HDFS-4925 
(TestQuorumJournalManager.testPurgeLogs intermittently Fails 
assertNoThreadsMatching)

  was:
I found a data race in ipc.Client.stop()

ipc.Client maintains a hash map of connection threads. When stop() is called, 
it interrupts all connection threads; the threads are supposed to remove itself 
from the hash map as part of the clean up work; and stop() periodically checks 
to see if the hash map is empty and then returns.

The bug is, this checking operation is not synchronized, and the connection 
thread actually removes itself from the hash map before terminating 
connections. 

This bug causes regression for HDFS-4925. In fact, the fix in HDFS-4925 may not 
be correct, because it assumes when it returns from 
QuorumJournalManager.close(), IPC client connection threads are terminated. But 
the reality is the IPC code assumes connections are closed, not the thread 
(which in any case is buggy as well).

This is also likely related to the bug reported in HDFS-4925 
(TestQuorumJournalManager.testPurgeLogs intermittently Fails 
assertNoThreadsMatching)


> Data race in IPC client Client.stop()
> -------------------------------------
>
>                 Key: HADOOP-12532
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12532
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Wei-Chiu Chuang
>            Assignee: Wei-Chiu Chuang
>
> I found a data race in ipc.Client.stop()
> ipc.Client maintains a hash map of connection threads. When stop() is called, 
> it interrupts all connection threads; the threads are supposed to remove 
> itself from the hash map as part of the clean up work; and stop() 
> periodically checks to see if the hash map is empty and then returns.
> The bug is, this checking operation is not synchronized, and the connection 
> thread actually removes itself from the hash map before terminating 
> connections. 
> This bug causes regression for HDFS-4925. In fact, the fix in HDFS-4925 may 
> not be correct, because it assumes when it returns from 
> QuorumJournalManager.close(), IPC client connection threads are terminated. 
> But the reality is the IPC code assumes connections are closed, not the IPC 
> connection threads (which in any case is buggy as well).
> This is also likely related to the bug reported in HDFS-4925 
> (TestQuorumJournalManager.testPurgeLogs intermittently Fails 
> assertNoThreadsMatching)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to