[ 
https://issues.apache.org/jira/browse/SLING-7778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16545113#comment-16545113
 ] 

Stefan Egli edited comment on SLING-7778 at 7/16/18 12:10 PM:
--------------------------------------------------------------

I've attached [^SLING-7778.patch] which suggests as a fix to add the {{Holder}} 
to the {{scheduledJobs}} map in {{ScheduledJobHandler.addOrUpdateJob}}. The 
idea being that without this, the {{scheduledJobs}} maps between 
{{JobSchedulerImpl}} and {{ScheduledJobHandler}} are not in sync, leading to 
{{ScheduledJobHandler.remove}} not calling {{JobSchedulerImpl.unscheduleJob}}, 
thus leaving a stale scheduled job with the {{JobSchedulerImpl}}.

With this, the synchronization in {{ScheduledJobHandler.addOrUpdateJob}} 
doesn't need to be touched at all.

Note that the patch fixes both cases mentioned

What I'm not sure is, if there was perhaps some reason for not adding it 
originally, therefore I would appreciate if someone ([~cziegeler] ?) could give 
this a review, thx!!


was (Author: egli):
I've attached  [^SLING-7778.patch] which suggests as a fix to add the 
{{Holder}} to the {{scheduledJobs}} map in 
{{ScheduledJobHandler.addOrUpdateJob}}. The idea being that without this, the 
{{scheduledJobs}} maps between {{JobSchedulerImpl}} and {{ScheduledJobHandler}} 
are not in sync, leading to {{ScheduledJobHandler.remove}} not calling 
{{JobSchedulerImpl.unscheduleJob}}, thus leaving a stale scheduled job with the 
{{JobSchedulerImpl}}.

With this, the synchronization in {{ScheduledJobHandler.addOrUpdateJob}} 
doesn't need to be touched at all.

What I'm not sure is, if there was perhaps some reason for not adding it 
originally, therefore I would appreciate if someone ([~cziegeler] ?) could give 
this a review, thx!!

> race-condition in scheduled job creation might keep scheduled job in-memory, 
> and cause a duplicate scheduled job execution
> --------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SLING-7778
>                 URL: https://issues.apache.org/jira/browse/SLING-7778
>             Project: Sling
>          Issue Type: Bug
>          Components: Event
>    Affects Versions: Event 4.2.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>             Fix For: Event 4.2.12
>
>         Attachments: SLING-7778.patch
>
>
> h2. case 1 : (extremely unlikely) observation faster than addOrUpdateJob can 
> even finish
> The creation of a scheduled job is handled in 
> [{{JobSchedulerImpl.addScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L501]
>  which in turn [calls 
> {{scheduledJobHandler.addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L523].
>  There, the job is first persisted to the repository via [calling 
> {{writeScheduledJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L213].
> Normally the sequence of events is then as follows:
>  * [A] the [synchronized block in 
> {{addOrUpdateJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L216]
>  is entered and the job is scheduled with quartz.
>  * [B] then shortly afterwards, observation is triggered and 
> {{handleAddOrUpdate}} is called. This will, as a *side-effect* [add the 
> scheduled job to the {{scheduledJobs}} 
> map|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L446]
>  * the result of this being that the scheduled job is now in 
> {{ScheduledJobHandler.scheduledJobs}}
>  * later on, when the scheduled job is finished, {{handleRemove()}} (or 
> similar) is called, and it checks if it finds the scheduled job in 
> {{ScheduledJobHandler.scheduledJobs}}. if it does, all good. And here we're 
> in the all good case, so all is fine
> However, there's the following race-condition:
>  * first [B] happens, ie first the observation event is triggered, so 
> {{handleAddOrUpdate}} is called, and again the side-effect hits: the 
> scheduled job is added to the {{ScheduledJobHandler.scheduledJobs}}. all good 
> so far
>  * then [A] continues into the synchronized block. *But* now, the [remove of 
> the scheduled 
> job|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L217]
>  actually does damage: the scheduled job ends up *not* being in the 
> {{ScheduledJobHandler.scheduledJobs}} map.
>  * so in this scenario, when a later {{handleRemove()}} (or similar) happens, 
> it can't find the scheduled job (see 
> [here|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L370]
>  h being null)
>  * the result being that 
> [{{JobScheduler.unscheduleJob}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/ScheduledJobHandler.java#L371]
>  is *not invoked*, *leaving the scheduled job with the JobSchedulerImpl*.
> And the result of this race-condition is that 
> {{JobSchedulerImpl.scheduledJobs}} map has a zombie scheduled job, which it 
> shouldn't actually have. Note *this situation can be verified in the 
> /system/console/slingevent webconsole* under "Scheduled Jobs".
> So later on for example, a topology event happens, such as a mere 
> PROPERTIES_CHANGED_EVENT. That in turn will trigger 
> [{{JobSchedulerImpl.configurationChanged()}}|https://github.com/apache/sling-org-apache-sling-event/blob/ca009499cbe30ae30df5069effe051e0bef9ef72/src/main/java/org/apache/sling/event/impl/jobs/scheduling/JobSchedulerImpl.java#L152]
>  which will now *re-schedule the previously scheduled job*. And we have a 
> duplicate scheduled job execution.
> h2. case 2: (likely) scheduled job finishes before observation event triggered
> Besides case 1, there's also the *more likely* case 2: where a scheduled job 
> finishes (thus calls {{ScheduledJobHandler.remove()}}) before the observation 
> event for {{ADDED}} was ever called (leading to 
> {{ScheduledJobHandler.handleAddOrUpdate()}}).
> In this case, the {{remove()}} fails to delete the scheduled job from the 
> (parent) {{JobSchedulerImpl}}, as it doesn't have it in the {{scheduledJob}} 
> list yet.
> Which results - same as case 1 - in an inconsistency leading to a stale 
> {{ScheduledJobInfoImpl}} in {{JobSchedulerImpl.scheduledJobs}} map - which 
> when a PROPERTIES_CHANGED_EVENT is called leads to duplicate scheduled job 
> execution.
> Note that this is only likely to happen if the schedule is set for 'very soon 
> now', eg {{at(NOW+1000)}} or something along those lines. If the schedule is 
> couple minutes in the future, the likelyhood of observation not triggering 
> before drops towards zero.
> So overall, this scenario is likely to happen only with such 'immediate 
> schedules' and could be mitigated by increasing the delay to eg 1 or a couple 
> of minutes.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to