[ https://issues.apache.org/jira/browse/ARTEMIS-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Daniel Lindberg updated ARTEMIS-807: ------------------------------------ Description: We are running Artemis 1.4.0 on RHEL 7.2 using a master/slave setup using replication (one master and one slave). We did some failover/failback testing while having light load on the broker (15 messages/second). The failover worked without issues and the flow of messages was uninterupted. However on failback we got several exceptions, and eventually ended up in a state were both master and backup were down, resuling in our application failing. I haven't been able to track down the meaning of "File not opened code - 6", but this exception was repeated before we saw "ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live]" Stack trace below: {noformat} 14:07:23,987 WARN [org.apache.activemq.artemis.journal] AMQ142027: Error on writing data! File not opened code - 6: java.lang.Exception: File not opened at org.apache.activemq.artemis.core.io.DummyCallback.onError(DummyCallback.java:36) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFile$DelegateCallback.onError(AbstractSequentialFile.java:296) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.internalWrite(NIOSequentialFile.java:307) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.writeDirect(NIOSequentialFile.java:277) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFile$LocalBufferObserver.flushBuffer(AbstractSequentialFile.java:324) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:290) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:262) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFileFactory.deactivateBuffer(AbstractSequentialFileFactory.java:156) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.stop(JournalImpl.java:2121) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:215) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:157) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.stop(ReplicationEndpoint.java:339) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1038) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:254) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) [artemis-server-1.4.0.jar:1.4.0] 14:07:24,005 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live] at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) [artemis-server-1.4.0.jar:1.4.0] 14:09:13,332 WARN [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server. 14:09:13,343 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 1.4.0 [d8756440-9521-11e6-b058-005056be0eea] stopped, uptime 2 minutes {noformat} was: We are running Artemis 1.4.0 on RHEL 7.2 using a master/slave setup using replication (one master and one slave). We did some failover/failback testing while having light load on the broker (15 messages/second). The failover worked without issues and the flow of messages was uninterupted. However on failback we got several exceptions, and eventually ended up in a state were both master and backup were down, resuling in our application failing. I haven't been able to track down the meaning of "File not opened code - 6", but this exception was repeated before we saw "ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live]" Stack trace below: {code} 14:07:23,987 WARN [org.apache.activemq.artemis.journal] AMQ142027: Error on writing data! File not opened code - 6: java.lang.Exception: File not opened at org.apache.activemq.artemis.core.io.DummyCallback.onError(DummyCallback.java:36) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFile$DelegateCallback.onError(AbstractSequentialFile.java:296) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.internalWrite(NIOSequentialFile.java:307) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.writeDirect(NIOSequentialFile.java:277) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFile$LocalBufferObserver.flushBuffer(AbstractSequentialFile.java:324) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:290) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:262) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.io.AbstractSequentialFileFactory.deactivateBuffer(AbstractSequentialFileFactory.java:156) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.journal.impl.JournalImpl.stop(JournalImpl.java:2121) [artemis-journal-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:215) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:157) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.replication.ReplicationEndpoint.stop(ReplicationEndpoint.java:339) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1038) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:254) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) [artemis-server-1.4.0.jar:1.4.0] 14:07:24,005 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: Failure in initialisation: ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: Backup Server was not yet in sync with live] at org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) [artemis-server-1.4.0.jar:1.4.0] at org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) [artemis-server-1.4.0.jar:1.4.0] 14:09:13,332 WARN [org.apache.activemq.artemis.core.client] AMQ212004: Failed to connect to server. 14:09:13,343 INFO [org.apache.activemq.artemis.core.server] AMQ221002: Apache ActiveMQ Artemis Message Broker version 1.4.0 [d8756440-9521-11e6-b058-005056be0eea] stopped, uptime 2 minutes{code} > "Error on writing data! File not opened code - 6" on failback > ------------------------------------------------------------- > > Key: ARTEMIS-807 > URL: https://issues.apache.org/jira/browse/ARTEMIS-807 > Project: ActiveMQ Artemis > Issue Type: Bug > Components: Broker > Affects Versions: 1.4.0 > Reporter: Daniel Lindberg > > We are running Artemis 1.4.0 on RHEL 7.2 using a master/slave setup using > replication (one master and one slave). We did some failover/failback testing > while having light load on the broker (15 messages/second). The failover > worked without issues and the flow of messages was uninterupted. > However on failback we got several exceptions, and eventually ended up in a > state were both master and backup were down, resuling in our application > failing. > I haven't been able to track down the meaning of "File not opened code - 6", > but this exception was repeated before we saw > "ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: > Backup Server was not yet in sync with live]" > Stack trace below: > {noformat} > 14:07:23,987 WARN [org.apache.activemq.artemis.journal] AMQ142027: Error on > writing data! File not opened code - 6: java.lang.Exception: File not opened > at > org.apache.activemq.artemis.core.io.DummyCallback.onError(DummyCallback.java:36) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.AbstractSequentialFile$DelegateCallback.onError(AbstractSequentialFile.java:296) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.internalWrite(NIOSequentialFile.java:307) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.nio.NIOSequentialFile.writeDirect(NIOSequentialFile.java:277) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.AbstractSequentialFile$LocalBufferObserver.flushBuffer(AbstractSequentialFile.java:324) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:290) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.buffer.TimedBuffer.flush(TimedBuffer.java:262) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.io.AbstractSequentialFileFactory.deactivateBuffer(AbstractSequentialFileFactory.java:156) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.journal.impl.JournalImpl.stop(JournalImpl.java:2121) > [artemis-journal-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:215) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.persistence.impl.journal.JournalStorageManager.stop(JournalStorageManager.java:157) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.replication.ReplicationEndpoint.stop(ReplicationEndpoint.java:339) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl.stopComponent(ActiveMQServerImpl.java:1038) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:254) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) > [artemis-server-1.4.0.jar:1.4.0] > > 14:07:24,005 ERROR [org.apache.activemq.artemis.core.server] AMQ224000: > Failure in initialisation: > ActiveMQIllegalStateException[errorType=ILLEGAL_STATE message=AMQ119026: > Backup Server was not yet in sync with live] > at > org.apache.activemq.artemis.core.server.impl.SharedNothingBackupActivation.run(SharedNothingBackupActivation.java:310) > [artemis-server-1.4.0.jar:1.4.0] > at > org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl$ActivationThread.run(ActiveMQServerImpl.java:2412) > [artemis-server-1.4.0.jar:1.4.0] > > 14:09:13,332 WARN [org.apache.activemq.artemis.core.client] AMQ212004: > Failed to connect to server. > 14:09:13,343 INFO [org.apache.activemq.artemis.core.server] AMQ221002: > Apache ActiveMQ Artemis Message Broker version 1.4.0 > [d8756440-9521-11e6-b058-005056be0eea] stopped, uptime 2 minutes > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)