Re: [RFC] Design for chained jobs

Michael Hanselmann Thu, 19 May 2011 04:42:36 -0700

Am 19. Mai 2011 10:39 schrieb Iustin Pop <ius...@google.com>:
> On Wed, May 18, 2011 at 04:41:55PM +0200, Michael Hanselmann wrote:
>> The following is a design proposal for the implementation of “chained jobs”. 
>> It
>> is not yet finished, but before I get into the technical details I'd like to
>> get some review on the general idea (see the “TODO” section).
>
> Question: how does this help multi-group?


In response to “multi-relocate” requests iallocators will return a
list of jobsets which need to be executed to reach the desired result.
While we can just return that list and let the client handle it
(similar to OpNodeEvacStrategy), I spent some time thinking of a
solution usable in other, but similar usecases.

Specifically, I need this for evacuating whole groups. I expect other
opcodes could make use of it as well, e.g. evacuating a node
(OpNodeEvacStrategy).

I want to make it possible to use this feature from within
LU-generated jobs. Since the job ID isn't know at the time of
generating the opcodes, there'll need to be some mechanism to describe
the dependencies. Ah, this is where submitting the jobs directly from
the LU would've been handy. :-)

>> +++ b/doc/design-chained-jobs.rst
>> +One way to work around this limitation is to do some kind of job
>> +grouping in the client code. Once all jobs of a group have finished, the
>> +next group is submitted and waited for. There are different kinds of
>> +clients for Ganeti, some of which don't share code (e.g. Python clients
>> +vs. htools). This design proposes a solution which would be implemented
>> +as part of the job queue in the master daemon.
>
> FYI, for htools at least, the current solution is working well enough,
> so I'm not likely to change over. The rationale is that queue management
> is easier in the current situation as compared to submitting all jobs
> upfront.

Of course htools can continue to work as it did so far. This design
proposes an additional feature.

>> +Proposed changes
>> +================
> […]
>
> Question: does this mean one has to submit the first job, get its id,
> and only then submit the second job where 'depend' contains the id
> gotten from the first job submit?

Yes, so far the job ID is the best identifier for a job (unless one
wants to extend the API and to provide a function for submitting jobs
which need to be executed in a certain order). There's no requirement,
however, to submit consecutive jobs right away. The “depend” attribute
just guarantees the job to be executed after the jobs given in it.

>> +Client-side logic
>> +-----------------
>> +
>> +There's at least one implementation of a batched job executor twisted
>> +into the ``burnin`` tool's code. While certainly possible, a client-side
>> +solution should be avoided due to the different clients already in use.
>> +For one, the :doc:`remote API <rapi>` client shouldn't import
>> +non-standard modules. htools are written in Haskell and can't use Python
>> +modules. A batched job executor contains quite some logic.
>
> Disagree here (last sentence) :)

It's more code than a few lines. It needs queueing, checking the results, etc.

Michael

Re: [RFC] Design for chained jobs

Reply via email to