Hi,

I chatted with Ian last week about making our jobs engine more
scalable (based on his SLING-5646 work) and I think a dead simple API
for batch jobs might be useful, alongside our existing
org.apache.sling.event Jobs API.

Our jobs API provides fine-grained synchronous control on the jobs
that it executes, like stopping jobs, querying the engine for job
states etc.

That's useful for small jobs and page approval workflows, but it's
hard to implement in a scalable and distributed way.

Heavy jobs like digital asset processing, for example, do not need
such fine grained control.

To execute such jobs in a scalable way, a "fire and almost forget"
scenario should work fine: submit a job to process an asset, subscribe
to an events stream about its status, check the latest status received
after a time T and it's not done consider it failed submit it again.
Make the jobs themselves idempotent for robustness if needed.

I think this would be useful alongside our existing jobs engine, with
an API that can be as simple as this:

  interface BatchEngine {
    /* Options can be relative priority, preferences for which node
executes the job, etc. */
    JobId submit(Callable<Void> job, Map<String, Object> options);
  }

  interface BatchEventsSource {
    /** If restrictToSpecificJobIds is not null the last known state
of these jobs is resent, if available */
    void registerBatchEventListener(BatchEventsListener bleh, JobId
... restrictToSpecificJobIds);
 }

  interface BatchEventsListener {
    void onEvent(BatchEvent beh);
  }

  class BatchEvent {
    JobId getJobId();
    JobStatus getStatus();
    String getInfo();
  }

WDYT?

We might also use an existing API if there's a good one, but I think
we don't need more than the above.

-Bertrand

Reply via email to