[ 
https://issues.apache.org/jira/browse/DISPATCH-1888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17263407#comment-17263407
 ] 

Ken Giusti commented on DISPATCH-1888:
--------------------------------------

After analysis of logs after this failure:

{{70: Traceback (most recent call last): }}
{{70: File 
"/home/kgiusti/work/dispatch/qpid-dispatch/tests/system_tests_http1_adaptor.py",
 line 1685, in test_03_bad_response_message }}
{{70: self.assertIsNone(error) }}
{{70: AssertionError: 'client failed: Bad response code, expected 200 got 503' 
is not None }}

I've come to the conclusion that this is a race occurring in the test, and the 
router is actually 'doing the right thing' responding with Service Unavailable.

The test performs two HTTP transactions back to back.  The first is simulating 
a badly encoded HTTP response, followed up by a valid GET operation to verify 
the router has properly recovered from the first transaction.

The test works like this:
 # a dummy amqp consumer is attached to the configured HTTP server port.  This 
is not a true server, just a consumer to provide a source of credit and fool 
the adaptor into thinking a server is attached
 # a dummy amqp producer is attached to the router's amqp listener port.  This 
dummy producer is used to generate the specially tailored bad response to the 
request, since a real http server won't send bad responses.
 # client connects to router, issues GET request
 # the request is sent to the dummy amqp consumer, which simply accepts and 
drops it
 # this causes the test to trigger the dummy producer to send the bad response 
to the http bridge address
 # the router adaptor receives the bad response and fails to parse it as 
intended
 # the router adaptor rejects the bad response message, and generates an HTTP 
response containing a server error code and sends it to the client
 # the test client verifies that the response is a server error response and 
closes its connection to the router
 # at this point the first transaction is complete and the test closes the fake 
amqp producer and the fake server attached to the HTTP server port
 # what should happen at this point is the adaptor should get the 
RAW_DISCONNECT event, but when the failure occurs the disconnect event doesn't 
occur... yet
 # the next transaction beings, a client connects and issues a GET request
 # the GET request is written to the not yet closed fake server connection
 # after it is written the DISCONNECT finally arrives
 # this ends up in sending the (valid but unexpected) Service Unavailable 
response

I suspect this delay in closing the old connection is due to python's 
asynchronous socket close implementation.  Calling socket.close() is python 
does not immediately close the underlying tcp socket.  The close occurs when 
the socket is eventually released by the garbage collector.

The test clients need to implement a synchronous socket close by using 
socket.shutdown() then an explicit del of the socket attribute.

 

 

 

 

 

> HTTP1: CI failure in test_02_bad_request_message and 
> test_03_bad_response_message
> ---------------------------------------------------------------------------------
>
>                 Key: DISPATCH-1888
>                 URL: https://issues.apache.org/jira/browse/DISPATCH-1888
>             Project: Qpid Dispatch
>          Issue Type: Test
>          Components: Protocol Adaptors
>    Affects Versions: 1.15.0
>            Reporter: Ken Giusti
>            Assignee: Ken Giusti
>            Priority: Critical
>             Fix For: 1.15.0
>
>
> These testcases fail frequently on Travis CI.   Needs investigation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to