[ https://issues.apache.org/jira/browse/DERBY-3364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12571136#action_12571136 ]
narayanan edited comment on DERBY-3364 at 2/21/08 10:41 AM: -------------------------------------------------------------- Thank you for the ctrl+c tip Jorgen. Pls find below the changes to the files explained and also the run of the attached repro. General Failure Analysis -------------------------------------- There was a lot of discrepancy I observed in the runs. 1) Failover succeeds the first time saying it was successful 2) Failover hangs the second time it is called 3) exit hangs after the first Failover These were due to a combination of reasons, a) The read on the InputStream obtained in the client socket was not timing out b) The Transmitter should close the socket when a failover is successful or unsuccessful. (This is a client socket, I am not sure it was a big problem) c) The log shipper thread should be terminate upon failover failure or success. Files Modified and Explanation ----------------------------------------------- M java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java * Set a timeout on the socket that is translated as a timeout on the reads on the I/P streams * Add method to tear down the socket obtained. M java/engine/org/apache/derby/impl/services/replication/master/MasterController.java * handle the IOException being thrown from stopLogShipment because of the exception thrown by tearDown. * The log shipper needs to be stopped when failover fails also inaddition to being stopped upon a success. M java/engine/org/apache/derby/impl/services/replication/master/AsynchronousLogShipper.java * Make the log shipper tear down the socket and the streams obtained from the socket. Repro Runs ------------------- Embedded -------- ij version 10.4 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost'; Did a ctrl+c on slave here ij(CONNECTION1)> connect 'jdbc:derby:masterDB;user=oystein;password=pass;failover=true'; ERROR XRE21: Error occurred while performing failover for database 'masterDB', Failover attempt was aborted. ij(CONNECTION1)> exit; Client ------ ij version 10.4 ij> connect 'jdbc:derby://localhost:1527/replicationdb'; ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001'; Did a ctrl + c on slave here ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;failover=true'; ERROR XRE21: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: replicationdbXRE21 ij(CONNECTION1)> exit; This patch will clash with Derby-3428. The patch that is committed first will break the other. was (Author: narayanan): Thank you for the ctrl+c clip Jorgen. Pls find below the changes to the files explained and also the run of the attached repro. There was a lot of discrepancy I observed in the runs. 1) Failover succeeds the first time saying it was successful 2) Failover hangs the second time it is called 3) exit hangs after the first Failover These were due to a combination of reasons, a) The read on the InputStream obtained in the client socket was not timing out b) The Transmitter should close the socket when a failover is successful or unsuccessful. (This is a client socket, I am not sure it was a big problem) c) The log shipper thread should be terminate upon failover failure or success. M java/engine/org/apache/derby/impl/services/replication/net/ReplicationMessageTransmit.java * Set a timeout on the socket that is translated as a timeout on the reads on the I/P streams * Add method to tear down the socket obtained. M java/engine/org/apache/derby/impl/services/replication/master/MasterController.java * handle the IOException being thrown from stopLogShipment because of the exception thrown by tearDown. * The log shipper needs to be stopped when failover fails also inaddition to being stopped upon a success. M java/engine/org/apache/derby/impl/services/replication/master/AsynchronousLogShipper.java * Make the log shipper tear down the socket and the streams obtained from the socket. Embedded -------- ij version 10.4 ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;create=true'; ij> call syscs_util.syscs_freeze_database(); 0 rows inserted/updated/deleted ij> connect 'jdbc:derby:masterDB;user=oystein;password=pass;startMaster=true;slaveHost=localhost'; Did a ctrl+c on slave here ij(CONNECTION1)> connect 'jdbc:derby:masterDB;user=oystein;password=pass;failover=true'; ERROR XRE21: Error occurred while performing failover for database 'masterDB', Failover attempt was aborted. ij(CONNECTION1)> exit; Client ------ ij version 10.4 ij> connect 'jdbc:derby://localhost:1527/replicationdb'; ij> connect 'jdbc:derby://localhost:1527/replicationdb;startMaster=true;slaveHost=localhost;slavePort=8001'; Did a ctrl + c on slave here ij(CONNECTION1)> connect 'jdbc:derby://localhost:1527/replicationdb;failover=true'; ERROR XRE21: DERBY SQL error: SQLCODE: -1, SQLSTATE: XRE21, SQLERRMC: replicationdbXRE21 ij(CONNECTION1)> exit; This patch will clash with Derby-3428. The patch that is committed first will break the other. > Replication failover implementation must be modified to fail at the master > after slave has been stopped > ------------------------------------------------------------------------------------------------------- > > Key: DERBY-3364 > URL: https://issues.apache.org/jira/browse/DERBY-3364 > Project: Derby > Issue Type: Bug > Components: Replication > Affects Versions: 10.4.0.0 > Reporter: V.Narayanan > Assignee: V.Narayanan > Attachments: Derby3364_v1.diff, Derby3364_v1.stat > > > Jorgen says... > I tried to run the failover command on the master, which seems to work fine > as long as the master and slave are still connected. If the slave has been > stopped for some reason, however, failover hangs on > MasterController#startFailover here: > ack = transmitter.readMessage(); -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.