Artem Karpenko created AMQ-5618:
-----------------------------------

             Summary: Infinite loop in log replay with Replicated LevelDB
                 Key: AMQ-5618
                 URL: https://issues.apache.org/jira/browse/AMQ-5618
             Project: ActiveMQ
          Issue Type: Bug
    Affects Versions: 5.11.1
         Environment: Linux
            Reporter: Artem Karpenko
            Priority: Critical


This is very similar to AMQ-5300 except that I use replicatedLevelDB 
persistence adapter and in order to reproduce I don't have to delete any index 
files.

Setup: 1 ZK instance, 3 AMQ nodes.
One of the AMQ configs:
{code}
<replicatedLevelDB directory="${activemq.data}/replicatedLevelDB"
                                replicas="3"
                                bind="tcp://0.0.0.0:61619"
                                zkAddress="instance-6:2181"
                                zkPath="/activemq/leveldb-stores"
                                hostname="instance-7" />
{code}
Difference between nodes is only in hostname attribute.

The way to reproduce is almost the same as in AMQ-5300: 
# Produce lots of messages to generate several log files in leveldb data 
directory.
# Consume _some_ messages until you see "Deleting log" in activemq.log.
# Restart master.
# Restart the second master.
# Observe the massive (infinite?) logging on slave and relatively calm but 
still possibly infinite logging on master.

This is what the first master logs after it's restarted:
{code}
2015-02-25 21:37:08,338 | DEBUG | Download session connected... | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:08,582 | INFO  | Slave skipping download of: 
log/00000000190be289.log | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,099 | INFO  | Slave skipping download of: 
log/000000000642f848.log | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,411 | INFO  | Slave skipping download of: 
log/000000000c85f06d.log | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,838 | INFO  | Slave skipping download of: 
log/0000000012c8e921.log | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,842 | INFO  | Slave requested: 
000000001c9373b4.index/CURRENT | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,846 | INFO  | Slave requested: 
000000001c9373b4.index/MANIFEST-000002 | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,850 | INFO  | Slave requested: 
000000001c9373b4.index/000003.log | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,857 | INFO  | Attaching... Downloaded 0.02/95.65 kb and 1/3 
files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,859 | INFO  | Attaching... Downloaded 0.06/95.65 kb and 2/3 
files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,861 | INFO  | Attaching... Downloaded 95.65/95.65 kb and 
3/3 files | org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,862 | INFO  | Attached | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:37:09,878 | DEBUG | Taking a snapshot of the current index: 
/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index 
| org.apache.activemq.leveldb.LevelDBClient | Thread-2
2015-02-25 21:37:10,352 | DEBUG | Recovering from last index snapshot at: 
/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | 
org.apache.activemq.leveldb.LevelDBClient | Thread-2
{code}

Right after that everything seems fine. But as soon as I stop the new master, 
the another new master (that would be the third one) logs

{code}
2015-02-25 21:38:43,876 | INFO  | Promoted to master | 
org.apache.activemq.leveldb.replicated.MasterElector | main-EventThread
2015-02-25 21:38:43,894 | INFO  | Using the pure java LevelDB implementation. | 
org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
Task-5
2015-02-25 21:38:45,954 | WARN  | Invalid log position: 44 | 
org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
Task-5
2015-02-25 21:38:45,955 | WARN  | Invalid log position: 325280 | 
org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
Task-5
...
2015-02-25 21:38:46,696 | WARN  | Invalid log position: 104726993 | 
org.apache.activemq.leveldb.LevelDBClient | ActiveMQ BrokerService[localhost] 
Task-5
2015-02-25 21:38:46,902 | INFO  | Master started: tcp://instance-8:61619 | 
org.apache.activemq.leveldb.replicated.MasterElector | ActiveMQ 
BrokerService[localhost] Task-6
...
2015-02-25 21:38:52,127 | INFO  | Initializing Spring FrameworkServlet 
'dispatcher' | /admin | main
2015-02-25 21:38:53,181 | INFO  | jolokia-agent: No access restrictor found at 
classpath:/jolokia-access.xml, access to all MBeans is allowed | /api | main
2015-02-25 21:38:56,881 | WARN  | Invalid log position: 0 | 
org.apache.activemq.leveldb.LevelDBClient | Thread-4
2015-02-25 21:39:06,884 | WARN  | Invalid log position: 0 | 
org.apache.activemq.leveldb.LevelDBClient | Thread-4
2015-02-25 21:39:16,887 | WARN  | Invalid log position: 0 | 
org.apache.activemq.leveldb.LevelDBClient | Thread-4
...
{code}
the last message starts repeating every 10 seconds without hint of stopping.
At the same time the one left slave (that was the 1st master) logs
{code}
2015-02-25 21:38:50,124 | INFO  | Attached | 
org.apache.activemq.leveldb.replicated.SlaveLevelDBStore | 
hawtdispatch-DEFAULT-1
2015-02-25 21:38:50,134 | DEBUG | Taking a snapshot of the current index: 
/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000001c9373b4.index 
| org.apache.activemq.leveldb.LevelDBClient | Thread-3
2015-02-25 21:38:50,149 | DEBUG | Recovering from last index snapshot at: 
/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/dirty.index | 
org.apache.activemq.leveldb.LevelDBClient | Thread-3
2015-02-25 21:38:50,189 | DEBUG | Replay of journal from: 0 to 479425460. | 
org.apache.activemq.leveldb.LevelDBClient | Thread-3
2015-02-25 21:38:50,328 | WARN  | No reader available for position: 0, 
log_infos: 
{105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
 
210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
 
315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
 
420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
 | org.apache.activemq.leveldb.RecordLog | Thread-3
2015-02-25 21:38:50,332 | WARN  | No reader available for position: 0, 
log_infos: 
{105052232=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000642f848.log,105052232,105052197),
 
210104429=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/000000000c85f06d.log,210104429,105052340),
 
315156769=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/0000000012c8e921.log,315156769,105052520),
 
420209289=LogInfo(/usr/local/apache-activemq-5.11.1/data/replicatedLevelDB/00000000190be289.log,420209289,0)}
 | org.apache.activemq.leveldb.RecordLog | Thread-3
...
{code}
the last message repeats infinitely and every ~1ms, with activemq process 
taking 100% CPU.

If I would now start the stopped node (that was the 2d master) it would log 
exactly the same message with the same intensity.

It should also be noted that while I can observe correct diminished (relatively 
to initial, due to some consumption) count of messages in a queue via console 
after the first restart, the second restart changes that and queue size becomes 
bigger (although I'm not sure if it is equal to the size of queue before any 
consumption).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to