Hi,

your $PBS_SPOOL/mom_logs/<logfile> will tell you why it is not there.

Best regards,

Danny

Yanan Sun wrote:
-bash-3.1$ echo "perl helloworld.pl node001" |qsub -q short -l nodes=node001
129.master.perceus.centos
-bash-3.1$ qstat -f
Job Id: 129.master.perceus.centos
    Job_Name = STDIN
    Job_Owner = [EMAIL PROTECTED]
    resources_used.cput = 00:00:00
    resources_used.mem = 0kb
    resources_used.vmem = 0kb
    resources_used.walltime = 00:00:00
    job_state = E
    queue = short
    server = master.perceus.centos
    Checkpoint = u
    ctime = Tue Aug  5 09:37:01 2008
    Error_Path = master.perceus.centos:/home/ys/STDIN.e129
    exec_host = node001/0
    Hold_Types = n
    Join_Path = n
    Keep_Files = n
    Mail_Points = a
    mtime = Tue Aug  5 09:37:02 2008
    Output_Path = master.perceus.centos:/home/ys/STDIN.o129
    Priority = 0
    qtime = Tue Aug  5 09:37:01 2008
    Rerunable = True
    Resource_List.nodect = 1
    Resource_List.nodes = node001
    Resource_List.walltime = 00:05:00
    session_id = 6452
    Variable_List = PBS_O_HOME=/home/ys,PBS_O_LANG=en_US.UTF-8,
        PBS_O_LOGNAME=ys,
        
PBS_O_PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin:/home/ys/Desktop/cmake-2.6.0-Linux-i386/bin,
        PBS_O_MAIL=/var/spool/mail/ys,PBS_O_SHELL=/bin/bash,
        PBS_O_HOST=master.perceus.centos,PBS_O_WORKDIR=/home/ys,
        PBS_O_QUEUE=short
    etime = Tue Aug  5 09:37:01 2008
    exit_status = 126
    submit_args = -q short -l nodes=node001

so the error file should be at /home/ys/
but it was not there

thanks.

Yanan


On Fri, Aug 1, 2008 at 3:28 AM, Danny Sternkopf <[EMAIL PROTECTED]> wrote:
Hi,

check you MOM log file in $PBS_SPOOL/mom_logs/ on the node which was set
offline.

So there is a difference between node001 and node002.

If you run 'qstat -f' on the job you see where the *.o and *.o files will be
stored. Check the name of the target host if it can be accessed from these
two nodes.

Best regards,

Danny

Yanan Sun wrote:
i added two nodes on the cluster, node001 and node002.
if i keep both free, i don't get any STDIN.e# and STDIN.o# files.
if i put one offline, i got both files.
anyone knows why?


Yanan
_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

_______________________________________________
torqueusers mailing list
[EMAIL PROTECTED]
http://www.supercluster.org/mailman/listinfo/torqueusers



--
Danny Sternkopf http://www.nec.de/hpc       [EMAIL PROTECTED]
HPCE Division  Germany phone: +49-711-68770-35 fax: +49-711-6877145
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
NEC Deutschland GmbH, Hansaallee 101, 40549 Düsseldorf
Geschäftsführer Yuya Momose
Handelsregister Düsseldorf HRB 57941; VAT ID DE129424743

_______________________________________________
mauiusers mailing list
[email protected]
http://www.supercluster.org/mailman/listinfo/mauiusers

Reply via email to