On 5 Sep 2006, at 13:43, Wim Oudshoorn wrote:
Debugging under windows is a little tricky, but in our applicaton I
observe
the following deadlock:
Thread 8:
NSMessagePort _setupSendPort line 145 self =
0x19a5770 Block on Lock: this->lock
NSMessagePort newWithName line
208 Grabs lock: messagePortLock
...
NSMessagePort receivedEventRead line 638 self =
0x2a45c90
Thread 1:
NSMessagePort newWithName line
200 Block on lock: messagePortLock
...
NSMessagePort receivedEventRead line 638 self =
0x19a5770 Grabs lock: 0x19a5770->lock
Consequence: DEADLOCK!
So here is a scenario how we can end up in this situation.
1 - Thread 8 sends a message to thread 1.
2 - Thread 1 replies to thread 8
3 - Thread X sends a message to thread 1.
4 - Thread 1 handles starts handling the message from Thread X and
grabs
the 0x19a5770->lock
5 - Thread 8 starts handling the reply of thread 1
6 - Thread 8 reads the send port of the reply and tries to
get the port that was used to send the reply.
For this it calls newWithName.
7 - Thread 8 grabs the messsagePortLock in newWithName
8 - thread 8 calls _setupSendPort on the messageport 0x19a5770
which was used for sending
9 - Thread 8 tries to grab 0x19a5770->lock but fails (hold by
thread 1 in sterp 4)
10 - Thread 1 continues and wants to deduce the port that thread X
used for sending,
11 - Thread 1 calls newWithName and blocks on messsPortLock (hold
by thread 8 in step 7)
So an obvious fix is to try to make the locks non nesting in
newPortName: and initWithName:.
But:
A - I don't know if that is wrong
Seems plausible though.
B - I don't know if it is enough to fix the problem
I'm not sure either ... but there is no obvious way that this would
happen if the call to _setupSendPort is moved outside the region
protected by the messagePortLock ... so I've restructured the code
that way.
C - I just have this nagging feeling that _setupSendPort is
useless anyway. Why is it called on a port that already exists?
I think, because the port may exist only for receiving and need to be
set up for sending too.
This code also suffered from the bug that we could potentially get
double deallocation of a port if one thread searched the table and
found it while another thread was performing a final release on it.
I've added an implementation of -release which should fix that.
We need to review all the places where objects are 'uniqued' in a
global table but are not permanently cached ... they probably all
suffer from the same problem and need fixing.
_______________________________________________
Bug-gnustep mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/bug-gnustep