[
https://issues.apache.org/jira/browse/ARTEMIS-2926?focusedWorklogId=499313&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-499313
]
ASF GitHub Bot logged work on ARTEMIS-2926:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 12/Oct/20 10:16
Start Date: 12/Oct/20 10:16
Worklog Time Spent: 10m
Work Description: gemmellr commented on pull request #3287:
URL: https://github.com/apache/activemq-artemis/pull/3287#issuecomment-707028383
I think the changes seem ok, but I think perhaps the PR overlooks another
simpler and more important behaviour that may be leading to the observed issue?
The given period at construction of the scheduled tasks is documented as
"the delay between the termination of one execution and the start of the next".
Thats unsurprisingly consistent with the behaviour of
scheduledExecutorService.scheduleWithFixedDelay(), which is whats used by the
'not onDemand' instances of the scheduled tasks. However, the tasks dont
actually run in the scheduledExecutorService thread if the additional executor
is given during construction. If the second executor is given, the scheduled
task is just offloaded by the scheduledExecutorService for execution on the
provided executor and entirely forgotten about. That seems like it could be the
core of the observed issue to me?
The above means there is no further tracking by the scheduler of when the
task actually runs or how long the given task takes, meaning the periodic
contract is somewhat lost at that point forward. Consider a situation:
1. Say there is a backlog of existing (related or unrelated) things for the
executor still to run, so a new 'scheduled offloaded' task may not run for a
little while until that is processed. Or instead say that thread scheduling
means the second executor doesnt immediately get to executing the task.
Whatever the reason, something means there is a small delay, but eventually the
task does run.
2. A second task instance comes along from the scheduledExecutorService at
some point, very closely after the configured period since it isnt affected by
actual execution of the task, which gets offloaded. Maybe now there isnt any or
as much backlog on the second executor, or theres a better thread scheduling
environment, and this second task may get run relatively quicker than the prior
instance actualy did.
3. Due to the 'lastTime' tracking occuring within the task itself, on the
second executor, this second task instance which was offloaded by the
scheduledExecutorService at its precise period, will now be observed to have
occurred within the configured period of the previous tasks 'lstTime' and so
get skipped.
4. This means nothing happened, and wont until the scheduledExecutorService
comes along after a 3rd period and offloads the task another time, by which
point approx double the expected period has elapsed and the task actually
executes. Rinse and repeat this process over and over.
If the second executor is provided, its actual execution + 'lastTime' period
checks are essentially happening independently of the scheduling, and it seems
like the scheduledExecutorService is trying to somewhat blindly throw tasks
over a wall such that they land at the right time and actually get to run as
opposed to skipping and waiting for next time.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
Issue Time Tracking
-------------------
Worklog Id: (was: 499313)
Time Spent: 20m (was: 10m)
> Scheduled task executions are skipped randomly
> ----------------------------------------------
>
> Key: ARTEMIS-2926
> URL: https://issues.apache.org/jira/browse/ARTEMIS-2926
> Project: ActiveMQ Artemis
> Issue Type: Bug
> Components: Broker
> Affects Versions: 2.13.0
> Reporter: Apache Dev
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> Scheduled tasks extending {{ActiveMQScheduledComponent}} could randomly skip
> an execution, logging:
> {code}
> Execution ignored due to too many simultaneous executions, probably a
> previous delayed execution
> {code}
> The problem is in the "ActiveMQScheduledComponent#runForExecutor" Runnable.
> Times to be compared ({{currentTimeMillis()}} and {{lastTime}}) are taken
> inside the runnable execution itself. So, depending on relative execution
> times, it could happen that the difference is less than the given period
> (e.g. 1 ms), resulting in a skipped execution.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)