[
https://issues.apache.org/jira/browse/AMQ-7108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16695340#comment-16695340
]
Ben D'Herville commented on AMQ-7108:
-------------------------------------
Just a thought and I haven't looked into the activemq code in detail but it
seems like the data kahadb is reading in is incorrect which may indicate that a
race condition occurred writing the data to disk. i.e. is it possible that a
KahaAddScheduledJobCommand and a KahaTraceCommand were stored at the same time
and were given the same primary key?
> Scheduled job scheduling occasionally fails with
> java.lang.ClassCastException:
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: AMQ-7108
> URL: https://issues.apache.org/jira/browse/AMQ-7108
> Project: ActiveMQ
> Issue Type: Bug
> Components: Job Scheduler, KahaDB
> Affects Versions: 5.11.1
> Environment: *OS*: Linux ip-10-88-2-47 3.13.0-125-generic #174-Ubuntu
> SMP Mon Jul 10 18:51:24 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
> *AMQ*: 5.11.1
> *glusterfs*: 3.8.15
> *java*: 1.8.0_144-b01
>
> Reporter: Ben D'Herville
> Priority: Major
> Attachments: activemq.xml
>
>
> *Issue*
> We have been running the stack with the same stated versions mentioned in
> this ticket's environment details for over a year. Recently we have
> experienced ActiveMQ outages as a result of the exception below. There has
> been a increase in load but not significant.
> Scheduled messages are being used as a polling mechanism in a workflow
> system. A workflow will execute a step then schedule a small object message,
> containing just an identifier, to check for a result in a number of seconds.
> The consumer of the message will schedule another message if the task is not
> complete yet. We also have a batch process that can start a number of
> workflows concurrently. Usually in the order of 100's which means 100's of
> scheduled messages can be created concurrently.
> The most recent failure occurred with a batch of only 64.
> We haven't noticed any data loss as a result of this but it doesn't cause a
> complete outage of our production system.
> *Recovery*
> The only way we have been able to recover from this error is to restart the
> broker. kahadb is configured to check for corrupt journal which would
> recover the journal on startup. It is unclear if the journal is corrupted
> though.
> *Reproduction*
> We have tried to reproduce this issue using the ActiveMQ performance maven
> plugin but was unable to. We had 10000 producers concurrently schedule 500
> messages each.
>
> Note there are two stack traces here. The second one is possibly a
> side-effect of the first one.
> {code:java}
> 2018-Nov-21 15:16:48 JobScheduler:JMS
> [org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl] ERROR: JMS
> Failed to schedule job
> java.lang.ClassCastException:
> org.apache.activemq.store.kahadb.data.KahaTraceCommand cannot be cast to
> org.apache.activemq.store.kahadb.data.KahaAddScheduledJobCommand
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerStoreImpl.getPayload(JobSchedulerStoreImpl.java:529)
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.fireJob(JobSchedulerImpl.java:789)
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.mainLoop(JobSchedulerImpl.java:720)
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.run(JobSchedulerImpl.java:673)
> at java.lang.Thread.run(Thread.java:748)
> 2018-Nov-21 15:24:45 ActiveMQ Transport: tcp:///10.88.2.113:54256@61616
> [org.apache.activemq.transaction.LocalTransaction] WARN: POST COMMIT FAILED:
> java.lang.NullPointerException
> at
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:482)
> at
> org.apache.activemq.store.kahadb.AbstractKahaDBStore.store(AbstractKahaDBStore.java:394)
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.doSchedule(JobSchedulerImpl.java:264)
> at
> org.apache.activemq.store.kahadb.scheduler.JobSchedulerImpl.schedule(JobSchedulerImpl.java:99)
> at
> org.apache.activemq.broker.scheduler.SchedulerBroker.doSchedule(SchedulerBroker.java:197)
> at
> org.apache.activemq.broker.scheduler.SchedulerBroker.access$000(SchedulerBroker.java:48)
> at
> org.apache.activemq.broker.scheduler.SchedulerBroker$1.afterCommit(SchedulerBroker.java:162)
> at
> org.apache.activemq.transaction.Transaction.fireAfterCommit(Transaction.java:126)
> at
> org.apache.activemq.transaction.Transaction.doPostCommit(Transaction.java:195)
> at
> org.apache.activemq.transaction.Transaction$2.call(Transaction.java:55)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:990)
> at
> org.apache.activemq.store.kahadb.MessageDatabase.store(MessageDatabase.java:957)
> at
> org.apache.activemq.store.kahadb.KahaDBTransactionStore.commit(KahaDBTransactionStore.java:298)
> at
> org.apache.activemq.transaction.LocalTransaction.commit(LocalTransaction.java:70)
> at
> org.apache.activemq.broker.TransactionBroker.commitTransaction(TransactionBroker.java:253){code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)