[
https://issues.apache.org/jira/browse/AMQ-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13825318#comment-13825318
]
Hiram Chirino commented on AMQ-4837:
------------------------------------
Tenzin,
Does it also happen with the following build?
https://repository.apache.org/content/repositories/snapshots/org/apache/activemq/apache-activemq/5.10-SNAPSHOT/apache-activemq-5.10-20131106.134045-17-bin.tar.gz
> LevelDB corrupted in AMQ cluster
> --------------------------------
>
> Key: AMQ-4837
> URL: https://issues.apache.org/jira/browse/AMQ-4837
> Project: ActiveMQ
> Issue Type: Bug
> Components: activemq-leveldb-store
> Affects Versions: 5.9.0
> Environment: CentOS, Linux version 2.6.32-71.29.1.el6.x86_64
> java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64
> zookeeper-3.4.5.2
> Reporter: Guillaume
> Assignee: Hiram Chirino
> Priority: Critical
> Attachments: LevelDBCorrupted.zip, activemq.xml
>
>
> I have clustered 3 ActiveMQ instances using replicated leveldb and zookeeper.
> When performing some tests using Web UI, I can across issues that appears to
> corrupt the leveldb data files.
> The issue can be replicated by performing the following steps:
> 1. Start 3 activemq nodes.
> 2. Push a message to the master (Node1) and browse the queue using the web
> UI
> 3. Stop master node (Node1)
> 4. Push a message to the new master (Node2) and browse the queue using the
> web UI. Message summary and queue content ok.
> 5. Start Node1
> 6. Stop master node (Node2)
> 7. Browse the queue using the web UI on new master (Node3). Message
> summary ok however when clicking on the queue, no message details. An error
> (see below) is logged by the master, which attempts a restart.
> From this point, the database appears to be corrupted and the same error
> occurs to each node infinitely (shutdown/restart). The only way around is to
> stop the nodes and clear the data files.
> However when a message is pushed between step 5 and 6, the error doesn’t
> occur.
> =================================
> Leveldb configuration on the 3 instances:
> <persistenceAdapter>
> <replicatedLevelDB
> directory="${activemq.data}/leveldb"
> replicas="3"
> bind="tcp://0.0.0.0:0"
> zkAddress="zkserver:2181"
> zkPath="/activemq/leveldb-stores"
> />
> </persistenceAdapter>
> =================================
> The error is:
> INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException
> java.io.IOException
> at
> org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
> at
> org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
> at
> org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
> at
> org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270)
> at
> org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194)
> at
> org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708)
> at
> org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741)
> at
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
> at
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:258)
> at
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108)
> at
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
> at
> org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1875)
> at
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2086)
> at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1581)
> at
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
> at
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1198)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1194)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271)
> at
> org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:315)
> at
> org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:317)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1271)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
> at
> org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968)
> at
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974)
> at
> org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540)
> ... 17 more
--
This message was sent by Atlassian JIRA
(v6.1#6144)