[ 
https://issues.apache.org/jira/browse/QPID-3759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13238653#comment-13238653
 ] 

[email protected] commented on QPID-3759:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4383/
-----------------------------------------------------------

(Updated 2012-03-26 18:26:34.827957)


Review request for qpid, Andrew Stitcher, Ted Ross, Chug Rolke, and Steve 
Huston.


Changes
-------

This patch follows the same logic of the previous while avoiding CancelIoEx.

CancelIo as a substitution for CancelIoEx was considered but has thread 
restrictions that would have required a major rewrite of the base code.

I have substituted a much blunter instrument to achieve the completion, namely 
a full closesocket to unstick the read.  It forces all pending overlapped 
operations to completions, which is the last read in our case.


Summary
-------

The cause of the hang was an outstanding read side completion when the AsynchIO 
object in charge of the socket was in the queuedClose state.

The completion handler drains outstanding async requests before closing the 
socket.  Since the cable had been pulled, the async read would never complete 
until Windows gave up on the socket altogether (some time much later).

This patch remembers the last aio read and will cancel it  if in the 
queuedClose state before blocking again.


Aside from the basic description from the Jira, I also removed an unused test 
for restartRead, which doesn't change the logic of the section, but may 
indicate an intention that wasn't fully coded or something left over from a 
previous change.


This addresses bug QPID-3759.
    https://issues.apache.org/jira/browse/QPID-3759


Diffs (updated)
-----

  
http://svn.apache.org/repos/asf/qpid/trunk/qpid/cpp/src/qpid/sys/windows/AsynchIO.cpp
 1301636 

Diff: https://reviews.apache.org/r/4383/diff


Testing
-------

qpid-perftest, qpid-send, qpid-receive, cable pulls, broker pause/resumes


Thanks,

Cliff


                
> Heartbeat timeout in Windows does not lead to timely reconnect
> --------------------------------------------------------------
>
>                 Key: QPID-3759
>                 URL: https://issues.apache.org/jira/browse/QPID-3759
>             Project: Qpid
>          Issue Type: Bug
>          Components: C++ Client
>    Affects Versions: 0.14
>         Environment: Windows C++ messaging
>            Reporter: Chuck Rolke
>            Assignee: Cliff Jansen
>             Fix For: 0.17
>
>         Attachments: main.cpp
>
>
> Reported by Wolf Wolfswinkel on Qpid users 
> http://qpid.2158936.n2.nabble.com/Heartbeats-in-C-broker-on-Windows-td7118702.html
>  22-Dec-2011
> The simplest test case is in attached main.cpp. Establish a good network 
> connection to the broker and then start the program. It creates a connection, 
> sends two messages, and then pauses for 15 seconds. During the pause 
> disconnect the network connection to the broker for at least two heartbeat 
> timeouts (12 seconds).
> After the heartbeat timeout the timer task fires and a debug trace shows:
>  Traffic timeout,  TCPConnector::abort, TCPConnector::eof, TCPConnector::close
> But the connection is not actually closed until something happens on the 
> network to wake up the thread waiting in Poller::run().
> The timer event appears unable to interrupt the IO thread waiting for the 
> completion port.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to