Hi Mehdi,

To me, this bug is critical, because it makes the use of the moldable
jobs feature break the advance reservation feature, and both features
are important to users of OAR.

Moldable jobs are  especially used in the case of heterogeneous clusters
(e.g. clusters composed of nodes of 2 or more different hardware
specifications, because of a purchase in 2 or more phases for instance).
In that case, a job must be described with several choices of
specifications (e.g. # of cores + total time of execution), one for each
of the different homogeneous subsets of the cluster.
This is quite a common case, met in many installations of OAR.
The advance reservation feature is wanted by users who need to interact
with their job, thus be able to program the job execution time in order
to be sure to be present in front of the machines. This feature is used
a lot in research testbeds like Grid'5000 (www.grid5000.fr).

I would admit that using both the moldable job feature and the advance
reservation feature in a same use case (by a same user) is not so likely
to happen (which explain also why the bug wasn't noticed before the
release). But having both users submitting moldable jobs and users
making advance reservations will happen (the bug was reported quite
quicky actually).

For ref, the error log is the following:

[debug] [2015-02-18 21:35:26.373] [MetaSched] Begin processing of
waiting reservations (accepted reservations which do not have assigned
resources yet)
[debug] [2015-02-18 21:35:26.376] [MetaSched] [2] job is (0,u:,,)
[debug] [2015-02-18 21:35:26.379] [MetaSched] [2] add job occupation in
gantt (0,,,)
[debug] [2015-02-18 21:35:26.379] [MetaSched] [2] Add job in database
Use of uninitialized value in vec at /usr/lib/oar/oar_meta_sched line 342.
Use of uninitialized value $r in vec at /usr/lib/oar/oar_meta_sched line
357.
[debug] [2015-02-18 21:35:26.380] [MetaSched] End processing of waiting
reservations
DBD::Pg::db do failed: ERROR:  syntax error at or near ")"
LINE 2:               VALUES (3,)
                                ^ at /usr/share/perl5/OAR/IO.pm line 6270.

Job 1 is a moldable job here, then job 2's scheduling causes errors in
the code of the scheduler. As a result it is not scheduled, nor executed.

The administrator of the cluster will have no clue else than install the
next release of OAR, or the patched version.

Last info: The patch actually fixes another bug, regarding the clean-up
of the resource tree structure (calls to
delete_tree_nodes_with_not_enough_resources). This is a regression bug.
It is part of the patch because it was in the same commit in the
upstream VCS.
We could consider that second issue separately, but I think it is worth
being fixed as well, eventually as a whole.

Hope I convinced you.

Thanks for your time
Best regards,
Pierre


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to