[ 
https://issues.apache.org/jira/browse/QPID-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13131926#comment-13131926
 ] 

Rajith Attapattu commented on QPID-3532:
----------------------------------------

Robbie, thanks for the quick response.

I can try to test this out with a clustered setup, however we have our f2f 
meeting next week so if I don't get it done by tomorrow then it will be delayed 
by a week. Hence one reason for my reluctance to see this go through for 0.14.

While it is nice to have automated tests, through experience I've come to 
realize that they only provide very little confidence for a feature like 
failover. All most all of the previous failover issues were identified by 
manual testing with more real life scenarios or in production environments.

Most issues tend to happen when failover happens in the middle of a client 
going full steam (with operations like producing/consuming/creating/querying 
..etc). It is almost impossible to replicate this kind of scenario with the 
automated testing. The fact that there was no testing resembling production 
scenarios on your end makes a me a bit concerned about the changes.

I've tried to semi automate tests like this using the python testkit, but never 
got around to really get it to a reasonable state as it kept failing behind 
with changes made on the clustering side and brokertest.py .

(*) Another hidden danger of such a change could be an impact on performance. 
Something that cannot be verified by merely running automated tests. Again if 
we have an agreed upon framework where we can benchmark before and after fixes 
we suspect of being perf sensitive that would have been great. We could also 
run them on a per release basis to ensure that we have either improved or 
regressed on the perf front.

Anyways, If you feel confident about these changes and strongly feel they 
should be included in 0.14 then I'd accept your word on it and will not make a 
fuss about it. But I have to note that I have very little confidence in the 
changes or the testing being done here so far. Please note it's not a 
reflection of the work you've done for this particular patch, rather more on 
the inadequate testing strategy we have with the client in general.

Regards,

Rajith
                
> Fix the blocking of JMS operations when failover happens
> --------------------------------------------------------
>
>                 Key: QPID-3532
>                 URL: https://issues.apache.org/jira/browse/QPID-3532
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Client
>            Reporter: Alex Rudyy
>            Assignee: Alex Rudyy
>             Fix For: 0.13
>
>         Attachments: 
> QPID-3532-make-the-0-10-client-hold-the-failover-mutex-during-failover.patch
>
>
> When connection is lost and failover is started the Qpid Client should block 
> on invocation of JMS operations which require sending or receiving data over 
> the network.
> With current implementation the performing of certain operations during 
> failover can lead to unexpected behaviour.
> For example, closing QueueBrowsers during failover has been observed to cause 
>  issues because it is possible to send the old subscription destination in a 
> cancel command to the new broker as the close and failover are allowed to 
> progress concurrently. As result of it the broker might close the session 
> with a NOT_FOUND execution exception because failover has not finished queue 
> re-creation on a new broker

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to