On Tuesday, April 2, 2013, Carsten Ziegeler wrote:

> Hi,
>
> I'm currently prototyping enhancements in the Sling job handling to allow
> for better scaling in clustered / distributed environments. The current
> implementation makes a lot of assumptions and relies on JCR locks. These
> assumptions in combination of problems with JCR locks, usually lead to a
> setup of having just a single instance in a cluster for processing jobs.
> The goal is to be able to run jobs distributed in a cluster but also to be
> able to process jobs only on specifc instances (e.g. to offload some heavy
> jobs on dedicated machines).
>
> Though this is still in an early phase, I would like to run some of the
> potential changes for users through this list
>
> a) Jobs containing queue configurations
> The configuration of job handling is usually done through queue
> configurations. These queues are assigned to one or more job topic and have
> different characteristics like if these jobs can be processed in parallel,
> how often a job should be retried, delay between retries etc. The queue's
> are configured globally through OSGi ConfigAdmin and are therefore the same
> on all cluster nodes.
> When we started with the job handling, we didn't have this configuration,
> so each and every job contained this whole information as properties of the
> job itself - which clearly is a maintenance nightmare but can also lead to
> funny situations where two jobs with the same topic contain different
> configurations (e.g. one allowing parallel processing while the other does
> not).
> With the introduction of the queue configurations, we already reduced the
> per job configuration possibilities and in some cases these are already
> ignored.
>
> For the new version I plan to discontinue the per job configuration of
> queue's as it is simply not worth the effort to support it. And having a
> single truth of queue configurations makes maintenance and troubleshooting
> way easier.
>
> b) Job API
> Until now, we're leveraging the EventAdmin to add jobs but also to execute
> jobs. While this seemed elegant when we started with job handling, this
> creates another layer to the picture and adds a some uncertainty: e.g. a
> job could be added by sending an event to the event admin, but is not known
> to the sender whether this job really arrived at the job manager and/or got
> persisted at all. On the other hand implementing a job processor based on
> event admin looks more complicated than it should be.
>
> Therefore I think it's time to add a method to the JobManager for adding a
> job - if this method returns, the job is persisted and gets executed. For
> processing, we make the job processor interface a OSGi service interface.
> Implementations can register this service together with the topics this
> interface is able to process. This makes the implementation easier but also
> allows to find out which topics can be processed on a cluster node.
>
> c) Deprecate event admin based API
> As with b) we don't need the event admin based API anymore and should
> deprecate it - but of course for compatiblity still support it.
>
> WDYT?


I think the event based processing in the current implementation nicely
decouples the processing of jobs, but the implementation lacks a reliable
distributed queue and so, is bound to a single node, crucially limiting
scalability. The concept, not the mile mention, remind me of some extreemly
scalable BPM implementations. Rather that attempting to internalise that
within a jobmanager implementation, have you considered addressing a
distributed queue, or reusing an off the shelf component that has been
proven?

Ian



>
> Regards
> Carsten
> --
> Carsten Ziegeler
> [email protected] <javascript:;>
>

Reply via email to