[
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ben D'Herville updated AMQ-7108:
--------------------------------
Description:
*Issue*
We have been running the stack with the same stated versions mentioned in this
ticket's environment details for over a year. Recently we have experienced
ActiveMQ outages as a result of the exception below. There has been a increase
in load but not significant.
Scheduled messages are being used as a polling mechanism in a workflow system.
A workflow will execute a step then schedule a small object message, containing
just an identifier, to check for a result in a number of seconds. The consumer
of the message will schedule another message if the task is not complete yet.
We also have a batch process that can start a number of workflows concurrently.
Usually in the order of 100's which means 100's of scheduled messages can be
created concurrently.
The most recent failure occurred with a batch of only 64.
We haven't noticed any data loss as a result of this but it doesn't cause a
complete outage of our production system.
*Recovery*
The only way we have been able to recover from this error is to restart the
broker. kahadb is configured to check for corrupt journal which would recover
the journal on startup. It is unclear if the journal is corrupted though.
*Reproduction*
We have tried to reproduce this issue using the ActiveMQ performance maven
plugin but was unable to. We had 10000 producers concurrently schedule 500
messages each.
{{Note there are two stack traces here. The second one is possibly a
side-effect of the first one.}}
{{ 2018-Nov-21 15:16:48 JobScheduler:JMS
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed
to schedule job}}
{{ java.lang.ClassCastException:
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
{{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ
Transport: tcp:///10.88.2.113:54256@61616
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT
FAILED:java.lang.NullPointerException}}
{{ at
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
{{ at
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
{{ at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
{{ at
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
{{ at
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
{{ at
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
{{ at
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
{{ at
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
{{ at org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
{{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
{{ at
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
{{ at
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
{{ at
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
{{ at
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
{{ at
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}
was:
*Issue*
We have been running the stack with the same stated versions mentioned in this
ticket's environment details for over a year. Recently we have experienced
ActiveMQ outages as a result of the exception below. There has been a increase
in load but not significant.
Scheduled messages are being used as a polling mechanism in a workflow system.
A workflow will execute a step then schedule a small object message, containing
just an identifier, to check for a result in a number of seconds. The consumer
of the message will schedule another message if the task is not complete yet.
We also have a batch process that can start a number of workflows concurrently.
Usually in the order of 100's which means 100's of scheduled messages can be
created concurrently.
The most recent failure occurred with a batch of only 64.
We haven't noticed any data loss as a result of this but it doesn't cause a
complete outage of our production system.
*Recovery*
The only way we have been able to recover from this error is to restart the
broker. kahadb is configured to check for corrupt journal which would recover
the journal on startup. It is unclear if the journal is corrupted though.
*Reproduction*
We have tried to reproduce this issue using the ActiveMQ performance maven
plugin but was unable to. We had 10000 producers concurrently schedule 500
messages each.
Note there are two stack traces here. The second one is possibly a side-effect
of the first one.
2018-Nov-21 15:16:48 JobScheduler:JMS
[org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS Failed
to schedule job
java.lang.ClassCastException:
org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
at java.lang.Thread.run(Thread.java:748)
2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616
[org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT
FAILED:java.lang.NullPointerException
at
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
at
org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
at
org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
at
org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
at
org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
at
org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
at
org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
at
org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
at
org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
at
org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
at
org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
at
org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
at
org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)
> Scheduled job scheduling occasionally fails with
> java.lang.ClassCastException:
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
> Issue Type: Bug
> Components: Job Scheduler, KahaDB
> Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>
> Reporter: Ben D'Herville
> Priority: Major
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in
> this ticket's environment details for over a year. Recently we have
> experienced ActiveMQ outages as a result of the exception below. There has
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow
> system. A workflow will execute a step then schedule a small object message,
> containing just an identifier, to check for a result in a number of seconds.
> The consumer of the message will schedule another message if the task is not
> complete yet. We also have a batch process that can start a number of
> workflows concurrently. Usually in the order of 100's which means 100's of
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the
> broker. kahadb is configured to check for corrupt journal which would
> recover the journal on startup. It is unclear if the journal is corrupted
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven
> plugin but was unable to. We had 10000 producers concurrently schedule 500
> messages each.
>
> {{Note there are two stack traces here. The second one is possibly a
> side-effect of the first one.}}
> {{ 2018-Nov-21 15:16:48 JobScheduler:JMS
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS
> Failed to schedule job}}
> {{ java.lang.ClassCastException:
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}{{2018-Nov-21 15:24:45 ActiveMQ
> Transport: tcp:///10.88.2.113:54256@61616
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT
> FAILED:java.lang.NullPointerException}}
> {{ at
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)}}
> {{ at
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)}}
> {{ at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)}}
> {{ at
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)}}
> {{ at
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)}}
> {{ at
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)}}
> {{ at
> org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)}}
> {{ at
> org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)}}
> {{ at
> org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)}}
> {{ at java.util.concurrent.FutureTask.run(FutureTask.java:266)}}
> {{ at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)}}
> {{ at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)}}
> {{ at
> org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)}}
> {{ at
> org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)}}
> {{ at
> org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253)}}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)