[ 
https://issues.apache.org/jira/browse/AMQ-4837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837842#comment-13837842
 ] 

Tenzin giatso commented on AMQ-4837:
------------------------------------

Hi Hiram,
the stack is the same, but the environment and the test procedure are not the 
same.

I can reproduce it in a NON clustered/replicated environment.
The test procedure is easy : i send about 200 messages / second in a persiatant 
durable TOPIC. Messages are consume by only one consumer on real time.
A few time per day, AMQ FAILED with the same error as i ever posted (the same 
as Remo) and REBOOT automaticly (AMQ 5.9.0 FAILED and only try to stop).

My second problem is that after maybe 10 automatic reboot (after failure), AMQ 
is very slow and accept only a few messages per second from the producer.

> LevelDB corrupted in AMQ cluster
> --------------------------------
>
>                 Key: AMQ-4837
>                 URL: https://issues.apache.org/jira/browse/AMQ-4837
>             Project: ActiveMQ
>          Issue Type: Bug
>          Components: activemq-leveldb-store
>    Affects Versions: 5.9.0
>         Environment: CentOS, Linux version 2.6.32-71.29.1.el6.x86_64
> java-1.7.0-openjdk.x86_64/java-1.6.0-openjdk.x86_64
> zookeeper-3.4.5.2
>            Reporter: Guillaume
>            Assignee: Hiram Chirino
>            Priority: Critical
>         Attachments: LevelDBCorrupted.zip, activemq.xml
>
>
> I have clustered 3 ActiveMQ instances using replicated leveldb and zookeeper. 
> When performing some tests using Web UI, I can across issues that appears to 
> corrupt the leveldb data files.
> The issue can be replicated by performing the following steps:
> 1.    Start 3 activemq nodes.
> 2.    Push a message to the master (Node1) and browse the queue using the web 
> UI
> 3.    Stop master node (Node1)
> 4.    Push a message to the new master (Node2) and browse the queue using the 
> web UI. Message summary and queue content ok.
> 5.    Start Node1
> 6.    Stop master node (Node2)
> 7.    Browse the queue using the web UI on new master (Node3). Message 
> summary ok however when clicking on the queue, no message details. An error 
> (see below) is logged by the master, which attempts a restart.
> From this point, the database appears to be corrupted and the same error 
> occurs to each node infinitely (shutdown/restart). The only way around is to 
> stop the nodes and clear the data files.
> However when a message is pushed between step 5 and 6, the error doesn’t 
> occur.
> =================================
> Leveldb configuration on the 3 instances:
>               <persistenceAdapter>
>                       <replicatedLevelDB
>                                       directory="${activemq.data}/leveldb"
>                                       replicas="3"
>                                       bind="tcp://0.0.0.0:0"
>                                       zkAddress="zkserver:2181"
>                                       zkPath="/activemq/leveldb-stores"
>                                       />
>               </persistenceAdapter>
> =================================
> The error is:
> INFO | Stopping BrokerService[localhost] due to exception, java.io.IOException
> java.io.IOException
>         at 
> org.apache.activemq.util.IOExceptionSupport.create(IOExceptionSupport.java:39)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:543)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.might_fail_using_index(LevelDBClient.scala:974)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.collectionCursor(LevelDBClient.scala:1270)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.queueCursor(LevelDBClient.scala:1194)
>         at 
> org.apache.activemq.leveldb.DBManager.cursorMessages(DBManager.scala:708)
>        at 
> org.apache.activemq.leveldb.LevelDBStore$LevelDBMessageStore.recoverNextMessages(LevelDBStore.scala:741)
>         at 
> org.apache.activemq.broker.region.cursors.QueueStorePrefetch.doFillBatch(QueueStorePrefetch.java:106)
>         at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.fillBatch(AbstractStoreCursor.java:258)
>         at 
> org.apache.activemq.broker.region.cursors.AbstractStoreCursor.reset(AbstractStoreCursor.java:108)
>         at 
> org.apache.activemq.broker.region.cursors.StoreQueueCursor.reset(StoreQueueCursor.java:157)
>         at 
> org.apache.activemq.broker.region.Queue.doPageInForDispatch(Queue.java:1875)
>         at 
> org.apache.activemq.broker.region.Queue.pageInMessages(Queue.java:2086)
>         at org.apache.activemq.broker.region.Queue.iterate(Queue.java:1581)
>         at 
> org.apache.activemq.thread.PooledTaskRunner.runTask(PooledTaskRunner.java:129)
>         at 
> org.apache.activemq.thread.PooledTaskRunner$1.run(PooledTaskRunner.java:47)
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.NullPointerException
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1198)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$queueCursor$1.apply(LevelDBClient.scala:1194)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1272)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1$$anonfun$apply$mcV$sp$12.apply(LevelDBClient.scala:1271)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$RichDB.check$4(LevelDBClient.scala:315)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$RichDB.cursorRange(LevelDBClient.scala:317)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply$mcV$sp(LevelDBClient.scala:1271)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$collectionCursor$1.apply(LevelDBClient.scala:1271)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.usingIndex(LevelDBClient.scala:968)
>         at 
> org.apache.activemq.leveldb.LevelDBClient$$anonfun$might_fail_using_index$1.apply(LevelDBClient.scala:974)
>         at 
> org.apache.activemq.leveldb.LevelDBClient.might_fail(LevelDBClient.scala:540)
>         ... 17 more



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to