Hi Again.

I reloaded the Q as before and here are the various "diagnose -p" and qsub's:

Q loaded as before:

$ diagnose -p  
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
             Weights   --------       1( 1000)     1(    1)

27                            2     0.0(  0.0) 100.0(  2.1)
28                            2     0.0(  0.0) 100.0(  2.1)
29                            2     0.0(  0.0) 100.0(  2.1)
30                            2     0.0(  0.0) 100.0(  2.1)
31                            2     0.0(  0.0) 100.0(  2.1)
32                            2     0.0(  0.0) 100.0(  2.1)
33                            2     0.0(  0.0) 100.0(  2.1)
34                            2     0.0(  0.0) 100.0(  2.1)
35                            2     0.0(  0.0) 100.0(  2.1)

Percent Contribution   --------     0.0(  0.0) 100.0(100.0)
* indicates system prio set on job


( as user "tw")  qsub -I -q tw -l nodes=6:ppn=64

$ diagnose -p
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
             Weights   --------       1( 1000)     1(    1)

27                            3     0.0(  0.0) 100.0(  3.0)
28                            3     0.0(  0.0) 100.0(  3.0)
29                            3     0.0(  0.0) 100.0(  3.0)
30                            3     0.0(  0.0) 100.0(  3.0)
31                            3     0.0(  0.0) 100.0(  3.0)
32                            3     0.0(  0.0) 100.0(  3.0)
33                            3     0.0(  0.0) 100.0(  3.0)
34                            3     0.0(  0.0) 100.0(  3.0)
35                            3     0.0(  0.0) 100.0(  3.0)
2                             3     0.0(  0.0) 100.0(  3.0)
3                             3     0.0(  0.0) 100.0(  3.0)
4                             3     0.0(  0.0) 100.0(  3.0)
5                             3     0.0(  0.0) 100.0(  3.0)
6                             3     0.0(  0.0) 100.0(  3.0)
7                             3     0.0(  0.0) 100.0(  3.0)
15                            3     0.0(  0.0) 100.0(  3.0)
16                            3     0.0(  0.0) 100.0(  3.0)
17                            3     0.0(  0.0) 100.0(  3.0)
18                            3     0.0(  0.0) 100.0(  3.0)
19                            3     0.0(  0.0) 100.0(  3.0)
20                            3     0.0(  0.0) 100.0(  3.0)

Percent Contribution   --------     0.0(  0.0) 100.0(100.0)
* indicates system prio set on job



( as user "tw") qsub -I -q tw -l nodes=6:ppn=62

$ diagnose -p
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
             Weights   --------       1( 1000)     1(    1)

2                             4     0.0(  0.0) 100.0(  3.7)
3                             4     0.0(  0.0) 100.0(  3.7)
4                             4     0.0(  0.0) 100.0(  3.7)
5                             4     0.0(  0.0) 100.0(  3.7)
6                             4     0.0(  0.0) 100.0(  3.7)
7                             4     0.0(  0.0) 100.0(  3.7)
33                            4     0.0(  0.0) 100.0(  3.7)
34                            4     0.0(  0.0) 100.0(  3.7)
35                            4     0.0(  0.0) 100.0(  3.7)

Percent Contribution   --------     0.0(  0.0) 100.0(100.0)
* indicates system prio set on job


( as user "tw") qsub -I -q tw -l nodes=6:ppn=64

$ diagnose -p
diagnosing job priority information (partition: ALL)

Job                    PRIORITY*   Cred(  QOS)  Serv(QTime)
             Weights   --------       1( 1000)     1(    1)

2                             5     0.0(  0.0) 100.0(  4.8)
3                             5     0.0(  0.0) 100.0(  4.8)
4                             5     0.0(  0.0) 100.0(  4.8)
5                             5     0.0(  0.0) 100.0(  4.8)
6                             5     0.0(  0.0) 100.0(  4.8)
7                             5     0.0(  0.0) 100.0(  4.8)

Percent Contribution   --------     0.0(  0.0) 100.0(100.0)
* indicates system prio set on job

-------------------------------------------------------------

Deleting jobs 33-35 does not make job 38 run.   However, deleting ALL 1-core jobs does make 38 run - but I had to delete ALL of the 1-core jobs.

I also tried:

BACKFILLPOLICY        BESTFIT
RESERVATIONPOLICY     CURRENTHIGHEST

Restarted Maui and re-loaded the Q and re-created the condition but still having the same issue.


Edsall, William (WJ) wrote:

Hello,

 

Diagnose –p was truncated. I was hoping to see that 33-35 (Queued) did not have a large QTime which may be increasing their priority higher than your job 38. That could cause them to make job 38 wait even though they are not running. Sounds doubtful in your scenario but I’ve seen it cause issues before.

 

If you delete the Q state jobs 33-35, does your 38 start?

 

We use the same preemption concept you’re trying to achieve but I’m having a hard time narrowing down the cause for your error. A few small differences with our configuration is the backfill policy and reservation policy. You might try these settings and then restart maui:

BACKFILLPOLICY        BESTFIT

RESERVATIONPOLICY     CURRENTHIGHEST



_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to