-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

(Updated 2012-04-11 02:57:20.281375)


Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve 
Huston.


Changes
-------

The cancelled read usually results in an "aborted" status, but depending on how 
far the socketclose has progressed at the time the completion is posted, you 
can get a number of other statuses such as connection reset and several others. 
 This results in a spurious notifyDisconnect() and general mayhem from the 
deleted Socket.

Since closesocket() is a relatively long operation and the cancel operation 
occurred outside the completionQueue loop, the dislodged read completion could 
be processed in a separate thread resulting in a concurrent Socket::close() 
which, on occasion, yielded an exception.  This was fixed by moving the cancel 
inside the completionQueue loop so that the resulting completion would be 
serialized after the cancel.

So round three involves:

1. just using queuedClose to indicate a drained read
2. moving the socket.close() to serialize the read completion
3. adding a queuedDelete check before using a non-existent socket

Presumably #2 would never have occurred with CancelIoEx.  But it is probable 
that #1 would have been lurking, just occurring very rarely (depending on 
whether the other side closed its connection at just the right/wrong time).  #3 
can be attributed solely to my paranoia.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO 
object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the 
socket.  Since the cable had been pulled, the async read would never complete 
until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the 
queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test 
for restartRead, which doesn't change the logic of the section, but may 
indicate an intention that wasn't fully coded or something left over from a 
previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs (updated)
-----

  
http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp
 1301636 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff

Reply via email to