[
https://issues.apache.org/jira/browse/AMQ-5082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14389197#comment-14389197
]
Jim Robinson commented on AMQ-5082:
-----------------------------------
This is from a successful run:
2015-03-31 10:57:48,599 | INFO | ActiveMQ Task-EventThread | Promoted to
master
2015-03-31 10:57:48,601 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 10:57:48,621 | INFO | ActiveMQ Task | Master started:
tcp://localhost:61227
2015-03-31 10:57:48,625 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 10:57:48,625 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 10:57:48,626 | INFO | ActiveMQ Task | Attaching to master:
tcp://localhost:61227
2015-03-31 10:57:48,626 | INFO | ActiveMQ Task | Attaching to master:
tcp://localhost:61227
2015-03-31 10:57:48,626 | INFO | ActiveMQ Task | Slave started
2015-03-31 10:57:48,627 | INFO | ActiveMQ Task | Slave started
2015-03-31 10:57:48,883 | INFO | hawtdispatch-DEFAULT-3 | Slave has
connected: 03942600-8462-4b44-b40d-e3a398193638
2015-03-31 10:57:49,130 | INFO | hawtdispatch-DEFAULT-4 | Slave has
connected: 85f59a26-8faf-4d0c-992c-dc04ffa0e051
2015-03-31 10:57:49,132 | INFO | hawtdispatch-DEFAULT-5 | Attaching...
Downloaded 0.00/0.00 kb and 1/1 files
2015-03-31 10:57:49,132 | INFO | hawtdispatch-DEFAULT-5 | Attached
2015-03-31 10:57:49,137 | INFO | hawtdispatch-DEFAULT-7 | Attaching...
Downloaded 0.00/0.00 kb and 1/1 files
2015-03-31 10:57:49,137 | INFO | hawtdispatch-DEFAULT-7 | Attached
2015-03-31 10:57:49,142 | INFO | hawtdispatch-DEFAULT-3 | Slave has now
caught up: 03942600-8462-4b44-b40d-e3a398193638
2015-03-31 10:57:49,143 | INFO | hawtdispatch-DEFAULT-4 | Slave has now
caught up: 85f59a26-8faf-4d0c-992c-dc04ffa0e051
2015-03-31 10:58:15,391 | INFO | Thread-6 | Checking for a single master
And this is from a failed run:
2015-03-31 11:24:58,612 | INFO | ActiveMQ Task-EventThread | Promoted to
master
2015-03-31 11:24:58,614 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 11:24:58,631 | INFO | ActiveMQ Task | Master started:
tcp://localhost:49958
2015-03-31 11:24:58,636 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 11:24:58,637 | INFO | ActiveMQ Task | Attaching to master:
tcp://localhost:49958
2015-03-31 11:24:58,637 | INFO | ActiveMQ Task | Slave started
2015-03-31 11:24:58,936 | INFO | hawtdispatch-DEFAULT-3 | Slave has
connected: 143f918b-8ff0-4d8e-ba9b-04bd6644a443
2015-03-31 11:24:58,940 | INFO | hawtdispatch-DEFAULT-4 | Attaching...
Downloaded 0.00/0.00 kb and 1/1 files
2015-03-31 11:24:58,940 | INFO | hawtdispatch-DEFAULT-4 | Attached
2015-03-31 11:24:58,949 | INFO | hawtdispatch-DEFAULT-3 | Slave has now
caught up: 143f918b-8ff0-4d8e-ba9b-04bd6644a443
2015-03-31 11:24:59,248 | INFO | ActiveMQ
Task-SendThread(fe80:0:0:0:0:0:0:1%1:49627) | Socket connection established
to fe80:0:0:0:0:0:0:1%1/fe80:0:0:0:0:0:0:1%1:49627, initiating session
2015-03-31 11:24:59,248 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Accepted socket connection from /fe80:0:0:0:0:0:0:1%1:49962
2015-03-31 11:24:59,248 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Client attempting to renew session 0x14c7112e2cc0001 at
/fe80:0:0:0:0:0:0:1%1:49962
2015-03-31 11:24:59,249 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Invalid session 0x14c7112e2cc0001 for client /fe80:0:0:0:0:0:0:1%1:49962,
probably expired
2015-03-31 11:24:59,249 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Closed socket connection for client /fe80:0:0:0:0:0:0:1%1:49962 which had
sessionid 0x14c7112e2cc0001
2015-03-31 11:24:59,249 | INFO | ActiveMQ Task-EventThread | Initiating
client connection, connectString=localhost:49627 sessionTimeout=15000
watcher=org.apache.activemq.leveldb.replicated.groups.ZKClient@99b2a1d
2015-03-31 11:24:59,249 | INFO | ActiveMQ Task-EventThread | EventThread
shut down
2015-03-31 11:24:59,250 | INFO | ActiveMQ Task-SendThread(localhost:49627)
session
2015-03-31 11:24:59,250 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Accepted socket connection from /127.0.0.1:49963
2015-03-31 11:24:59,250 | INFO | NIOServerCxn.Factory:0.0.0.0/0.0.0.0:0 |
Client attempting to establish new session at /127.0.0.1:49963
2015-03-31 11:24:59,252 | INFO | SyncThread:0 | Established session
0x14c7112e2cc0005 with negotiated timeout 10000 for client /127.0.0.1:49963
2015-03-31 11:24:59,252 | INFO | ActiveMQ Task-SendThread(localhost:49627)
sessionid = 0x14c7112e2cc0005, negotiated timeout = 10000
2015-03-31 11:24:59,257 | INFO | ActiveMQ Task | Using the pure java
LevelDB implementation.
2015-03-31 11:24:59,258 | INFO | ActiveMQ Task | Attaching to master:
tcp://localhost:49958
2015-03-31 11:24:59,260 | INFO | ActiveMQ Task | Slave started
2015-03-31 11:24:59,541 | INFO | hawtdispatch-DEFAULT-5 | Slave has
connected: 5f12ef48-86ab-472a-ae1a-5379bf490902
2015-03-31 11:24:59,544 | INFO | hawtdispatch-DEFAULT-6 | Attaching...
Downloaded 0.00/0.00 kb and 1/1 files
2015-03-31 11:24:59,544 | INFO | hawtdispatch-DEFAULT-6 | Attached
2015-03-31 11:24:59,553 | INFO | hawtdispatch-DEFAULT-5 | Slave has now
caught up: 5f12ef48-86ab-472a-ae1a-5379bf490902
2015-03-31 11:25:25,396 | INFO | Thread-6 | Checking for a single master
> ActiveMQ replicatedLevelDB cluster breaks, all nodes stop listening
> -------------------------------------------------------------------
>
> Key: AMQ-5082
> URL: https://issues.apache.org/jira/browse/AMQ-5082
> Project: ActiveMQ
> Issue Type: Bug
> Components: activemq-leveldb-store
> Affects Versions: 5.9.0, 5.10.0
> Reporter: Scott Feldstein
> Assignee: Christian Posta
> Priority: Critical
> Fix For: 5.12.0
>
> Attachments: 03-07.tgz, amq_5082_threads.tar.gz,
> mq-node1-cluster.failure, mq-node2-cluster.failure, mq-node3-cluster.failure,
> zookeeper.out-cluster.failure
>
>
> I have a 3 node amq cluster and one zookeeper node using a replicatedLevelDB
> persistence adapter.
> {code}
> <persistenceAdapter>
> <replicatedLevelDB
> directory="${activemq.data}/leveldb"
> replicas="3"
> bind="tcp://0.0.0.0:0"
> zkAddress="zookeep0:2181"
> zkPath="/activemq/leveldb-stores"/>
> </persistenceAdapter>
> {code}
> After about a day or so of sitting idle there are cascading failures and the
> cluster completely stops listening all together.
> I can reproduce this consistently on 5.9 and the latest 5.10 (commit
> 2360fb859694bacac1e48092e53a56b388e1d2f0). I am going to attach logs from
> the three mq nodes and the zookeeper logs that reflect the time where the
> cluster starts having issues.
> The cluster stops listening Mar 4, 2014 4:56:50 AM (within 5 seconds).
> The OSs are all centos 5.9 on one esx server, so I doubt networking is an
> issue.
> If you need more data it should be pretty easy to get whatever is needed
> since it is consistently reproducible.
> This bug may be related to AMQ-5026, but looks different enough to file a
> separate issue.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)