Paul Manning created AMQ-5658:
---------------------------------
Summary: ActiveMQ will not start after KahaDB Corruption due to
"Protocol message contained an invalid tag (zero)" error
Key: AMQ-5658
URL: https://issues.apache.org/jira/browse/AMQ-5658
Project: ActiveMQ
Issue Type: Bug
Components: Message Store
Affects Versions: 5.10.0
Environment: Windows 7
Reporter: Paul Manning
We experienced an ActiveMQ crash where the KahaDB data files where corrupted.
The machine was powered down abruptly (pull the plug).
When the machine restarted, ActiveMQ would not start and the following entries
were in the activemq.log:
2015-03-05 09:25:46,791 | INFO | Corrupt journal records found in
'c:\work\09_git\vc-core\vc-server\build\data\kahadb\db-131.log' between
offsets: 31054572..31231936 |
org.apache.activemq.store.kahadb.disk.journal.Journal | WrapperSimpleAppMain
followed eventually by:
2015-03-05 09:25:48,375 | ERROR | Failed to start Apache ActiveMQ
([broker-USATL-L-008043.americas.abb.com-0, null],
org.apache.activemq.protobuf.InvalidProtocolBufferException: Protocol message
contained an invalid tag (zero).) | org.apache.activemq.broker.BrokerService |
WrapperSimpleAppMain
Removing the .data files and the corrupted db-131.log file allows ActiveMQ to
restart. However, in that case, we experience message loss.
Is it possible to only lose the corrupted record instead of the whole data
file?
Tracing through the code, it does not appear that there is any attempt to catch
the InvalidProtocolBufferException exception and discard the corrupted record.
The exception is raised from CodedInputStream.readTag() during the
MessageDatabase.recover() process.
It is worth noting that we have not been able to reproduce this error. I
imagine that this type of corruption is rare, but is there any way for a user
to recover from this. Any tools, etc.?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)