This patch adds a design document describing how jobs can be submitted
from within LUs.
---
 Makefile.am                      |    1 +
 doc/design-2.0.rst               |    2 +
 doc/design-2.3.rst               |    2 +
 doc/design-draft.rst             |    2 +
 doc/design-lu-generated-jobs.rst |   88 ++++++++++++++++++++++++++++++++++++++
 5 files changed, 95 insertions(+), 0 deletions(-)
 create mode 100644 doc/design-lu-generated-jobs.rst

diff --git a/Makefile.am b/Makefile.am
index 81c1dfb..9f254d4 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -244,6 +244,7 @@ docrst = \
        doc/design-draft.rst \
        doc/design-oob.rst \
        doc/design-query2.rst \
+       doc/design-lu-generated-jobs.rst \
        doc/cluster-merge.rst \
        doc/design-shared-storage.rst \
        doc/devnotes.rst \
diff --git a/doc/design-2.0.rst b/doc/design-2.0.rst
index c1acfa1..91c10f7 100644
--- a/doc/design-2.0.rst
+++ b/doc/design-2.0.rst
@@ -608,6 +608,8 @@ be leaf locks or carefully structured non-leaf ones, to 
avoid deadlock
 race conditions.
 
 
+.. _jqueue-original-design:
+
 Job Queue
 ~~~~~~~~~
 
diff --git a/doc/design-2.3.rst b/doc/design-2.3.rst
index f04ab8e..bdf4755 100644
--- a/doc/design-2.3.rst
+++ b/doc/design-2.3.rst
@@ -568,6 +568,8 @@ removed. Ganeti itself will allow clearing of both flags, 
even though
 this doesn't make much sense currently.
 
 
+.. _jqueue-job-priority-design:
+
 Job priorities
 --------------
 
diff --git a/doc/design-draft.rst b/doc/design-draft.rst
index addfbdd..fc4558b 100644
--- a/doc/design-draft.rst
+++ b/doc/design-draft.rst
@@ -5,6 +5,8 @@ Design document drafts
 .. toctree::
    :maxdepth: 2
 
+   design-lu-generated-jobs.rst
+
 .. vim: set textwidth=72 :
 .. Local Variables:
 .. mode: rst
diff --git a/doc/design-lu-generated-jobs.rst b/doc/design-lu-generated-jobs.rst
new file mode 100644
index 0000000..4b70390
--- /dev/null
+++ b/doc/design-lu-generated-jobs.rst
@@ -0,0 +1,88 @@
+==================================
+Submitting jobs from logical units
+==================================
+
+.. contents:: :depth: 4
+
+This is a design document about the innards of Ganeti's job processing.
+Readers are advised to study previous design documents on the topic:
+
+- :ref:`Original job queue <jqueue-original-design>`
+- :ref:`Job priorities <jqueue-job-priority-design>`
+
+
+Current state and shortcomings
+==============================
+
+Some Ganeti operations want to execute as many operations in parallel as
+possible. Examples are evacuating or failing over a node (``gnt-node
+evacuate``/``gnt-node failover``). Without changing large parts of the
+code, e.g. the RPC layer, to be asynchronous, or using threads inside a
+logical unit, only a single operation can be executed at a time per job.
+
+Currently clients work around this limitation by retrieving the list of
+desired targets and then re-submitting a number of jobs. This requires
+logic to be kept in the client, in some cases leading to duplication
+(e.g. CLI and RAPI).
+
+
+Proposed changes
+================
+
+The job queue lock is guaranteed to be released while executing an
+opcode/logical unit. This means an opcode can talk to the job queue and
+submit more jobs. It then receives the job IDs, like any job submitter
+using the LUXI interface would. These job IDs are returned to the
+client, who then will continue to wait for the jobs to finish.
+
+Technically, the job queue already passes a number of callbacks to the
+opcode processor. These are used for giving user feedback, notifying the
+job queue of an opcode having gotten its locks, and checking whether the
+opcode has been cancelled.
+
+To submit jobs, opcodes can return a list jobs, each of which is a list
+of opcodes (e.g.  ``[[op1, op2], [op3]]``), as an instance of a
+container class. The job processor will recognize this class and proceed
+to submit all jobs. The result of the aforementioned callback function
+for submitting job, a status and job ID/error message (equivalent to the
+job queue's ``SubmitManyJobs`` function) will be returned to the client.
+
+Job submissions can fail for multiple reasons, e.g. a full or drained
+job queue. Lists of jobs can not be submitted atomically, meaning some
+might fail while others succeed. The client is responsible for handling
+such cases.
+
+.. highlight:: javascript
+
+The result should be encapsulated in a dictionary allowing for future
+extension. Proposed structure::
+
+  {
+    "jobs": [
+      (True, "8149"),
+      (True, "21019"),
+      (False, "Submission failed"),
+      (True, "31594"),
+      ],
+  }
+
+
+Other discussed solutions
+=========================
+
+Instead of requiring the client to wait for the returned jobs, another
+idea was to do so from within the submitting opcode in the master
+daemon. While technically possible, doing so would have two major
+drawbacks:
+
+- Opcodes waiting for other jobs to finish block one job queue worker
+  thread
+- All locks must be released before starting the waiting process,
+  failure to do so can lead to deadlocks
+
+
+.. vim: set textwidth=72 :
+.. Local Variables:
+.. mode: rst
+.. fill-column: 72
+.. End:
-- 
1.7.3.5

Reply via email to