Sunil Vishwanath created AMQ-6173: ------------------------------------- Summary: ActiveMQ with replicated LevelDB using NFSv4 corrupts on failover back to the initial instance Key: AMQ-6173 URL: https://issues.apache.org/jira/browse/AMQ-6173 Project: ActiveMQ Issue Type: Bug Components: activemq-leveldb-store Affects Versions: 5.13.0 Environment: Linux: Installed kernel: 2.6.18-308.0.0.0.1.el5xen x86_64 with NFSv4
Reporter: Sunil Vishwanath I have setup the following to test with NFSv4 file system: ActiveMQ 5.13.0 with LevelDB (3 node cluster). Zookeeper 3.4.6 (3 node cluster). NFSv4 file system local to each server. (not shared) Started up all 3 Zookeeper nodes. Started up all 3 ActiveMQ nodes. As I started aamq2 first, it became the master. I am able to see all the queue statistics via ActiveMQ Web Console. I am watching all 3 AMQ "application.log" file using "tail -f application.log” command. Now I stopped the aamq2 instance. Aamq3 is now promoted to master as per the messages in the aamq3’s application.log I restarted aamq2 and its levelDB caught up. Now I stopped the aamq3 instance. Aamq1 is now promoted to master as per the message in the application log. I restarted aamq3 and its levelDB caught up. Now I stopped the aamq1 instance. Aamq2 is now promoted to master as per the messages below and it encounters errors: 2016-01-31T16:39:20.097313-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,097 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attaching... Downloaded 66.47/258.72 kb and 5/6 files 2016-01-31T16:39:20.103037-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,102 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attaching... Downloaded 258.72/258.72 kb and 6/6 files 2016-01-31T16:39:20.104353-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:39:20,104 thread=hawtdispatch-DEFAULT-3 category=org.apache.activemq.leveldb.replicated.SlaveLevelDBStore Attached 2016-01-31T16:46:45.021281-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,020 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.115987-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,115 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.188385-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,187 thread=ActiveMQ BrokerService[localhost] Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Slave stopped 2016-01-31T16:46:45.189199-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,188 thread=ActiveMQ BrokerService[localhost] Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Not enough cluster members have reported their update positions yet. 2016-01-31T16:46:45.214426-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,214 thread=main-EventThread category=org.apache.activemq.leveldb.replicated.MasterElector Promoted to master 2016-01-31T16:46:45.256560-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,255 thread=ActiveMQ BrokerService[localhost] Task-5 category=org.apache.activemq.leveldb.LevelDBClient Using the pure java LevelDB implementation. 2016-01-31T16:46:45.729608-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,729 thread=LevelDB IOException handler. category=org.apache.activemq.broker.BrokerService No IOExceptionHandler registered, ignoring IO exception 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcs java.io.IOException: java.lang.IllegalArgumentException: File is not a table (bad magic number) 2016-01-31T16:46:45.735717-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39) 2016-01-31T16:46:45.735752-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:552) 2016-01-31T16:46:45.735752-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:1044) 2016-01-31T16:46:45.735858-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.listCollections(LevelDBClient.scala:1167) 2016-01-31T16:46:45.735858-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.DBManager$$anonfun$3.apply(DBManager.scala:837) 2016-01-31T16:46:45.735877-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.DBManager$$anonfun$3.apply(DBManager.scala:837) 2016-01-31T16:46:45.737812-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.fusesource.hawtdispatch.package$RichExecutorTrait$$anonfun$future$1.apply$mcV$sp(hawtdispatch.scala:116) 2016-01-31T16:46:45.737812-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.fusesource.hawtdispatch.package$$anon$4.run(hawtdispatch.scala:330) 2016-01-31T16:46:45.737846-08:00 aamql2.bus.jetqa1.syseng.tmcs at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 2016-01-31T16:46:45.737862-08:00 aamql2.bus.jetqa1.syseng.tmcs at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 2016-01-31T16:46:45.737945-08:00 aamql2.bus.jetqa1.syseng.tmcs at java.lang.Thread.run(Thread.java:745) 2016-01-31T16:46:45.737945-08:00 aamql2.bus.jetqa1.syseng.tmcs by: com.google.common.util.concurrent.UncheckedExecutionException: java.lang.IllegalArgumentException: File is not a table (bad magic number) 2016-01-31T16:46:45.739623-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2256) 2016-01-31T16:46:45.739658-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache.get(LocalCache.java:3980) 2016-01-31T16:46:45.739735-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3984) 2016-01-31T16:46:45.739735-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4868) 2016-01-31T16:46:45.740809-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache.getTable(TableCache.java:80) 2016-01-31T16:46:45.740809-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache.newIterator(TableCache.java:69) 2016-01-31T16:46:45.740886-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache.newIterator(TableCache.java:64) 2016-01-31T16:46:45.741741-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.Version.getLevel0Files(Version.java:139) 2016-01-31T16:46:45.741801-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.DbImpl.internalIterator(DbImpl.java:757) 2016-01-31T16:46:45.742412-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.DbImpl.iterator(DbImpl.java:722) 2016-01-31T16:46:45.742412-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.DbImpl.iterator(DbImpl.java:83) 2016-01-31T16:46:45.742484-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorPrefixed(LevelDBClient.scala:281) 2016-01-31T16:46:45.743294-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply$mcV$sp(LevelDBClient.scala:1171) 2016-01-31T16:46:45.743355-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply(LevelDBClient.scala:1167) 2016-01-31T16:46:45.743980-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$listCollections$1.apply(LevelDBClient.scala:1167) 2016-01-31T16:46:45.743980-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:1038) 2016-01-31T16:46:45.744053-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:1044) 2016-01-31T16:46:45.744872-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:549) 2016-01-31T16:46:45.744935-08:00 aamql2.bus.jetqa1.syseng.tmcs ... 9 more 2016-01-31T16:46:45.744935-08:00 aamql2.bus.jetqa1.syseng.tmcs by: java.lang.IllegalArgumentException: File is not a table (bad magic number) 2016-01-31T16:46:45.745803-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.base.Preconditions.checkArgument(Preconditions.java:92) 2016-01-31T16:46:45.745803-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.table.Footer.readFooter(Footer.java:69) 2016-01-31T16:46:45.745830-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.table.MMapTable.init(MMapTable.java:52) 2016-01-31T16:46:45.745897-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.table.Table.<init>(Table.java:59) 2016-01-31T16:46:45.745897-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.table.MMapTable.<init>(MMapTable.java:44) 2016-01-31T16:46:45.747228-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache$TableAndFile.<init>(TableCache.java:115) 2016-01-31T16:46:45.747228-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache$TableAndFile.<init>(TableCache.java:102) 2016-01-31T16:46:45.747303-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache$1.load(TableCache.java:57) 2016-01-31T16:46:45.747303-08:00 aamql2.bus.jetqa1.syseng.tmcs at org.iq80.leveldb.impl.TableCache$1.load(TableCache.java:54) 2016-01-31T16:46:45.748398-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3579) 2016-01-31T16:46:45.748398-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2372) 2016-01-31T16:46:45.748471-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2335) 2016-01-31T16:46:45.749384-08:00 aamql2.bus.jetqa1.syseng.tmcs at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2250) 2016-01-31T16:46:45.749445-08:00 aamql2.bus.jetqa1.syseng.tmcs ... 26 more 2016-01-31T16:47:45.808014-08:00 aamql2.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:47:45,807 thread=LevelDB IOException handler. category=org.apache.activemq.leveldb.LevelDBStore Stopped LevelDB[/aamql/local/activemq/data/leveldb] As there are not enough servers to form a quorum, the Aamq3 LevelDB shuts down as per the following message: 2016-01-31T16:46:45.095350-08:00 aamql3.bus.jetqa1.syseng.tmcs severity=INFO datetime=2016-01-31 16:46:45,094 thread=ActiveMQ BrokerService[localhost] Task-4 category=org.apache.activemq.leveldb.replicated.MasterElector Slave stopped -- This message was sent by Atlassian JIRA (v6.3.4#6332)