[jira] Created: (AMQ-1350) JDBC master/slave does not work properly with datasources that can reconnect to the database

Eric Anderson (JIRA) Fri, 27 Jul 2007 14:39:09 -0700

JDBC master/slave does not work properly with datasources that can reconnect to 
the database
--------------------------------------------------------------------------------------------


                 Key: AMQ-1350
                 URL: https://issues.apache.org/activemq/browse/AMQ-1350
             Project: ActiveMQ
          Issue Type: Bug
          Components: Message Store
    Affects Versions: 5.x
         Environment: Linux x86_64, Sun jdk 1.6, Postgresql 8.2.4, c3p0 or 
other pooling datasources
            Reporter: Eric Anderson


This problem involves the JDBC master/slave configuration when the db server is 
restarted, or when the brokers lose their JDBC connections for whatever reason 
temporarily, and when a datasource is in use that can re-establish stale 
connections prior to providing them to the broker.

The problem lies with the JDBC locking strategy used to determine which broker 
is master and which are slaves.  Let's say there are two brokers, a master and 
a slave, and they've successfully initialized.  If you restart the database 
server, the slave will throw an exception because it's just caught an exception 
while blocked attempting to get the lock.  The slave will then *retry* the 
process of getting a lock over and over again.  Now, since the database was 
bounced, the *master* will have lost its lock in the activemq_lock table.  
However, with the current 4.x-5.x code, it will never "know" that it has lost 
the lock.  There is no mechanism to check the lock state.  So it will continue 
to think that it is the master and will leave all of its network connectors 
active.

When the slave tries to acquire the lock now, if the datasource has restored 
connections to the now-restarted database server, it will succeed.  The slave 
will come up as master, and there will be two masters active concurrently.  
Both masters should at this point be fully-functional, as both will have 
datasources that can talk to the database server once again.

I have tested this with c3p0 and verified that I get two masters after bouncing 
the database server.  If, at that point, I kill the original slave broker, the 
original master still appears to be functioning normally.  If, instead, I kill 
the original master broker, messages are still delivered via the original slave 
(now co-master).  It does not seem to matter which broker the clients connect 
to - both work.

There is no workaround that I can think of that would function correctly across 
multiple database bounces.  If a slave's datasource does not have the 
functionality to do database reconnects, then, after the first database server 
restart, it will never be able to establish a connection to the db server in 
order to attempt to acquire the lock.  This, combined with the fact that the 
JDBC master/slave topology does not have any favored brokers -- all can be 
masters or slaves depending on start-up order and the failures that have 
occurred over time, means that a datasource that can do reconnects is required 
on all brokers.  Therefore it would seem that in the JDBC masters/slave 
topology a database restart or temporary loss of database connectivity will 
always result in multiple masters.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (AMQ-1350) JDBC master/slave does not work properly with datasources that can reconnect to the database

Reply via email to