Re: [RFC] Design for chained jobs

Iustin Pop Thu, 19 May 2011 07:31:10 -0700

On Thu, May 19, 2011 at 12:42:28PM +0100, Michael Hanselmann wrote:
> Am 19. Mai 2011 10:39 schrieb Iustin Pop <ius...@google.com>:
> > On Wed, May 18, 2011 at 04:41:55PM +0200, Michael Hanselmann wrote:
> >> The following is a design proposal for the implementation of “chained 
> >> jobs”. It
> >> is not yet finished, but before I get into the technical details I'd like 
> >> to
> >> get some review on the general idea (see the “TODO” section).
> >
> > Question: how does this help multi-group?
> 
> In response to “multi-relocate” requests iallocators will return a
> list of jobsets which need to be executed to reach the desired result.
> While we can just return that list and let the client handle it
> (similar to OpNodeEvacStrategy), I spent some time thinking of a
> solution usable in other, but similar usecases.
> 
> Specifically, I need this for evacuating whole groups. I expect other
> opcodes could make use of it as well, e.g. evacuating a node
> (OpNodeEvacStrategy).


Ack.

My worry is that while yes, this is a very useful addition, it's again
work on generic framework instead of actually changing LU locking which
is a bit higher priority to get 2.5 done.

> I want to make it possible to use this feature from within
> LU-generated jobs. Since the job ID isn't know at the time of
> generating the opcodes, there'll need to be some mechanism to describe
> the dependencies. Ah, this is where submitting the jobs directly from
> the LU would've been handy. :-)

:)

> >> +++ b/doc/design-chained-jobs.rst
> >> +One way to work around this limitation is to do some kind of job
> >> +grouping in the client code. Once all jobs of a group have finished, the
> >> +next group is submitted and waited for. There are different kinds of
> >> +clients for Ganeti, some of which don't share code (e.g. Python clients
> >> +vs. htools). This design proposes a solution which would be implemented
> >> +as part of the job queue in the master daemon.
> >
> > FYI, for htools at least, the current solution is working well enough,
> > so I'm not likely to change over. The rationale is that queue management
> > is easier in the current situation as compared to submitting all jobs
> > upfront.
> 
> Of course htools can continue to work as it did so far. This design
> proposes an additional feature.
> 
> >> +Proposed changes
> >> +================
> > […]
> >
> > Question: does this mean one has to submit the first job, get its id,
> > and only then submit the second job where 'depend' contains the id
> > gotten from the first job submit?
> 
> Yes, so far the job ID is the best identifier for a job (unless one
> wants to extend the API and to provide a function for submitting jobs
> which need to be executed in a certain order). There's no requirement,
> however, to submit consecutive jobs right away. The “depend” attribute
> just guarantees the job to be executed after the jobs given in it.

Ack.

> >> +Client-side logic
> >> +-----------------
> >> +
> >> +There's at least one implementation of a batched job executor twisted
> >> +into the ``burnin`` tool's code. While certainly possible, a client-side
> >> +solution should be avoided due to the different clients already in use.
> >> +For one, the :doc:`remote API <rapi>` client shouldn't import
> >> +non-standard modules. htools are written in Haskell and can't use Python
> >> +modules. A batched job executor contains quite some logic.
> >
> > Disagree here (last sentence) :)
> 
> It's more code than a few lines. It needs queueing, checking the results, etc.

Sure. What I meant is not something 'seriously complicated'.

iustin

Re: [RFC] Design for chained jobs

Reply via email to