[
http://opensource.atlassian.com/projects/roller/browse/ROL-1446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
linda skrocki closed ROL-1446.
------------------------------
Resolution: Fixed
Fix Version/s: 4.0
Allen fixed in 4.0.
> Task leasing causes scheduling inconsistencies
> ----------------------------------------------
>
> Key: ROL-1446
> URL:
> http://opensource.atlassian.com/projects/roller/browse/ROL-1446
> Project: Roller
> Issue Type: Bug
> Affects Versions: 3.1
> Reporter: Allen Gilliland
> Assignee: Roller Unassigned
> Fix For: 4.0
>
>
> After a bit more poking around I have realized that some of the problems I've
> seen with the task scheduling is actually being caused by the leasing process
> we are using. The root of the problem is that the task scheduling is not
> properly synchronized with the leasing process and therefore scheduling drift
> happens.
> An example. Assume that a task is scheduled to run once per minute starting
> 00:00:00.50. This will mean that the subsequent run times for the task will
> be 00:01:00.50, 00:02:00.50, etc, etc. Now take into account the fact that
> in the database the leasing time of a task is defined by the time the task
> obtained a lease on db time, and that time is some amount of time after the
> time the actual task was started. So lets assume for a moment that it takes
> 700ms to obtain a lease via the db. This means that the time the db thinks a
> task is run is different than the time the app thinks the task is run, and in
> our particular example the actual clock difference will be 1 second
> (00:00:00.50 + 700ms = 00:00:01.20). What this means is that when the
> application runs the task the next time at 00:01:00.50 and tries to obtain a
> new lease it will be refused because the db thinks the last run time for the
> task was at 00:00:01.20 which is less than 60 seconds from 00:01:00.50. So
> this means that the additional time required to obtain a lease in the db can
> actually cause the lease time to be off by 1 or more seconds and therefore
> cause a subsequent run of the task to fail.
> I have seen this exact problem occur with jobs meant to run once daily where
> the job runs at just after midnight, obtains a lease at 00:00:01.xxx seconds
> and runs, and then the following day the task fails to run because the app
> thinks that the interval time for the task has not yet elapsed.
> Sorting this out will require better alignment of the clocks and timestamps
> stored in this process and this is the best option I can come up with right
> now ...
> When a task successfully obtains a lease and runs it must keep track of the
> exact time the task was first initiated, then when the task completes and
> releases its lease it stores that time in db as the last time the lease was
> acquired. This would basically be a fairly simple attempt at properly
> adjusting the lease time stored in the db so that it does not include the
> additional amount of time required to process obtaining the lease. So an
> example would be that if a task is set to run hourly starting at 05:00 and it
> obtains its lease at 05:01.20 then when the task completes we would subtract
> the 01.20 seconds from the time stored in the db so that the db properly
> reflects the time the task was run, not the time the lease was obtained.
> I am sure there are other ways to better synchronize the multiple clocks
> involved when doing clustered task scheduling, but at the end of the day it's
> apparent that part of the solution is going to have to involve properly
> accounting for the extra time that gets used up to obtain a lease so that
> scheduling doesn't drift.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://opensource.atlassian.com/projects/roller/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira