[ 
https://issues.apache.org/jira/browse/QPID-2994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12991728#comment-12991728
 ] 

Rajith Attapattu commented on QPID-2994:
----------------------------------------

The commit made in rev 1057460 uncovered a more deeper issue that violates the 
atomicity of a transaction that was disrupted by failover.
The symptom was one or two messages seems to get onto the queue outside of the 
transaction boundaries.
Upon closer inspection these were messages that were in the failed transaction. 
If the application  re-tries the failed transaction it results in duplicates 
further complicating the issue.

The underlying root cause is as follows.
1. When a message-transfer reaches the invoke method in Session.java and if the 
session-state is detached at that time, the thread waits until the session is 
OPEN or CLOSED.

2. If failover completes within the wait period and the session is resumed, 
thereby being marked OPEN and the message transfer in progress just resumes and 
reaches the broker.

3. At this point the session is still not marked transactional (and there is no 
logic in place to ever issue a txSelect after failover as well) so the message 
is enqueued.

4. In the meantime the JMS session used by the application gets to know that 
failover happens and is marked dirty and an exception is received.

5. If the application chooses to resume the session (ignoring the exception) 
then subsequent message transfers will get to the queue on the broker but the 
session will get closed once it sends a commit (or a rollback) as the broker 
will complain that the session is not transactional.

6. If the application chooses to create a new session then it will start 
sending sub sequent messages within transaction boundaries and work as 
expected. But will still have that extra one or two messages that sneaked in 
when the old session was reopned. If the application retired the aborted 
transaction then it will result in duplicates due to the messages that sneaked 
in.

A reasonable solution to this issue is to,
1) Close a session marked transactional immediately when the session detaches. 
i.e a transactional session is never resumed and a new session should be 
created to continue. 

2) We also need to document that clearly.

(*) Also during investigation I found a race condition where an application 
could create a new session (recreating due to an exception or a completely new 
session in the midst of failover) before the connection is open.
This results in session attach being sent before the connection negotiation is 
completed. All though the connect method and the createSession method in 
Connection.java contends for the same lock, the connect method which acquires 
it early, will releases the lock when it waits (until the connection achieves 
OPEN state) and the createSession method waiting on the lock will get it and 
continue. This actually exposed a bug in the C++ broker. See QPID-3033
We need to ensure that createSession method is not executed until the 
connection achieve OPEN state. I will open a separate JIRA for this.

(*)Another race condition found is that if a session is created (after the 
connection is over) but before the resume method (in Connection.java) is 
called, it results in the new session being reattached again. This could result 
in unnecessary duplication of messages.
We need to ensure that createSession method does not get executed until the 
resume method is completed. Again I will open a separate JIRA for this.


> transactions atomicity violated by 'transparent' failover
> ---------------------------------------------------------
>
>                 Key: QPID-2994
>                 URL: https://issues.apache.org/jira/browse/QPID-2994
>             Project: Qpid
>          Issue Type: Bug
>          Components: Java Client
>    Affects Versions: 0.6, 0.7, 0.8
>            Reporter: Rajith Attapattu
>            Assignee: Rajith Attapattu
>             Fix For: Future
>
>
> The messages published within a batch at the point the connection failsover 
> appear to be replayed outside of any transaction.
> Steps to Reproduce:
> 1. start transactional session on failover enabled connection
> 2. send batches of messages in transactions
> 3. kill the cluster node the client is connected to, to trigger failover mid
> transaction
> This happens due to the lower layer replaying unacked messages upon resuming 
> the connection.
> Message replay should not happen on a transacted session as there is no 
> benefit of doing so.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
Apache Qpid - AMQP Messaging Implementation
Project:      http://qpid.apache.org
Use/Interact: mailto:[email protected]

Reply via email to