Hi,

> I am not sure what you mean by this ... what is a "batch accounting log"
> and how did you "provide" it to GC3Pie?
in torque, the tracejob command requires  job accounting info files to retrieve 
a status of jobs.
The command eb abc.eb --job .. is executed on a cluster head node which is used 
for a job submission.
The head nodes do not have access to the job accounting files here.

Below is the part of debug logs. Both qstat and tracejob get exit status 0.
Note below the message "WARNING Failed removing remote folder ..". I guess this 
is only a warning  and does not explain the problem. The dir ~./gc3pie_jobs is 
located on NFS.
If you need more input, please let me know.

...........
== 2016-05-27 09:31:20,394 pbs.py:356 INFO Updated resource 'nehalem' status: 
free slots: -1, total running: 0, own running jobs: 0, own queued jobs: 0, 
total queued jobs: 0
== 2016-05-27 09:31:20,395 core.py:1671 DEBUG Engine.stats: Restricting to 
object of class 'Application'
== 2016-05-27 09:31:20,395 build_log.py:227 INFO GC3Pie job overview: 1 running 
(total: 1)
== 2016-05-27 09:31:30,405 workflow.py:422 DEBUG Task #0 in state RUNNING
== 2016-05-27 09:31:30,405 __init__.py:1811 DEBUG Calling state-transition 
handler 'running' on <gc3libs.workflow.DependentTaskCollection object at 
0x19f95d0> ...
== 2016-05-27 09:31:30,405 core.py:435 DEBUG About to update state of 
application: Application@19f9990 (currently: RUNNING)
== 2016-05-27 09:31:30,406 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:30,406 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:30,406 batch.py:567 DEBUG Checking remote job status with 
'qstat 1076106 | grep ^1076106' ...
== 2016-05-27 09:31:30,420 transport.py:943 DEBUG Executed local command 'qstat 
1076106 | grep ^1076106', got exit status: 0
== 2016-05-27 09:31:30,420 pbs.py:256 DEBUG translating PBS/Torque's `qstat` 
code 'R' to gc3libs.Run.State
== 2016-05-27 09:31:30,420 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:30,420 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:30,420 pbs.py:328 DEBUG Running `qstat -a`...
== 2016-05-27 09:31:30,483 transport.py:943 DEBUG Executed local command 'qstat 
-a', got exit status: 0
== 2016-05-27 09:31:30,483 pbs.py:338 DEBUG Computing updated values for 
total/available slots ...
== 2016-05-27 09:31:30,483 pbs.py:170 INFO Output line:
..........
== 2016-05-27 09:31:30,758 pbs.py:356 INFO Updated resource 'nehalem' status: 
free slots: -1, total running: 0, own running jobs: 0, own queued jobs: 0, 
total queued jobs: 0
== 2016-05-27 09:31:30,758 core.py:1671 DEBUG Engine.stats: Restricting to 
object of class 'Application'
== 2016-05-27 09:31:30,758 build_log.py:227 INFO GC3Pie job overview: 1 running 
(total: 1)
== 2016-05-27 09:31:40,768 workflow.py:422 DEBUG Task #0 in state RUNNING
== 2016-05-27 09:31:40,768 core.py:435 DEBUG About to update state of 
application: Application@19f9990 (currently: RUNNING)
== 2016-05-27 09:31:40,768 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:40,769 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:40,769 batch.py:567 DEBUG Checking remote job status with 
'qstat 1076106 | grep ^1076106' ...
== 2016-05-27 09:31:40,784 transport.py:943 DEBUG Executed local command 'qstat 
1076106 | grep ^1076106', got exit status: 0
== 2016-05-27 09:31:40,785 pbs.py:256 DEBUG translating PBS/Torque's `qstat` 
code 'C' to gc3libs.Run.State
== 2016-05-27 09:31:40,785 __init__.py:1811 DEBUG Calling state-transition 
handler 'terminating' on Application@19f9990 ...
== 2016-05-27 09:31:40,785 batch.py:603 DEBUG Retrieving accounting information 
using command 'tracejob 1076106' ...
== 2016-05-27 09:31:40,899 transport.py:943 DEBUG Executed local command 
'tracejob 1076106', got exit status: 0
== 2016-05-27 09:31:40,899 __init__.py:1811 DEBUG Calling state-transition 
handler 'terminating' on <gc3libs.workflow.ParallelTaskCollection object at 
0x1d37a50> ...
== 2016-05-27 09:31:40,899 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:40,900 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:40,900 pbs.py:328 DEBUG Running `qstat -a`...
== 2016-05-27 09:31:40,962 transport.py:943 DEBUG Executed local command 'qstat 
-a', got exit status: 0
== 2016-05-27 09:31:40,963 pbs.py:338 DEBUG Computing updated values for 
total/available slots ...
== 2016-05-27 09:31:40,963 pbs.py:170 INFO Output line:
...........
== 2016-05-27 09:31:41,241 pbs.py:356 INFO Updated resource 'nehalem' status: 
free slots: -1, total running: 0, own running jobs: 0, own queued jobs: 0, 
total queued jobs: 0
== 2016-05-27 09:31:41,241 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:41,242 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:41,243 batch.py:771 DEBUG Downloading job output into 
'/sw-new/apps/css.easyconfigs/repo/nehalem/job-log' ...
== 2016-05-27 09:31:41,247 core.py:663 DEBUG Downloaded output of 
'Application@19f9990' (which is in state TERMINATING)
== 2016-05-27 09:31:41,247 core.py:669 DEBUG Final output of 
'Application@19f9990' retrieved
== 2016-05-27 09:31:41,247 __init__.py:1811 DEBUG Calling state-transition 
handler 'terminated' on Application@19f9990 ...
== 2016-05-27 09:31:41,247 __init__.py:1811 DEBUG Calling state-transition 
handler 'terminated' on <gc3libs.workflow.ParallelTaskCollection object at 
0x1d37a50> ...
== 2016-05-27 09:31:41,248 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:41,248 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:41,248 transport.py:1049 DEBUG 
LocalTransport.remove_tree(): removing local directory tree 
'/home/nanava/.gc3pie_jobs/lrms_job.HQkZnO8Fhm'
== 2016-05-27 09:31:41,264 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:41,264 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:41,264 transport.py:1049 DEBUG 
LocalTransport.remove_tree(): removing local directory tree 
'/home/nanava/.gc3pie_jobs/lrms_job.HQkZnO8Fhm'
== 2016-05-27 09:31:41,265 batch.py:740 WARNING Failed removing remote folder 
'/home/nanava/.gc3pie_jobs/lrms_job.HQkZnO8Fhm': <class 
'gc3libs.exceptions.TransportError'>: Could not remove directory tree 
'/home/nanava/.gc3pie_jobs/lrms_job.HQkZnO8Fhm': OSError: [Errno 2] No such 
file or directory: '/home/nanava/.gc3pie_jobs/lrms_job.HQkZnO8Fhm'
== 2016-05-27 09:31:41,265 core.py:1671 DEBUG Engine.stats: Restricting to 
object of class 'Application'
== 2016-05-27 09:31:41,265 build_log.py:227 INFO GC3Pie job overview: 1 
terminated, 1 failed (total: 1)
== 2016-05-27 09:31:51,275 workflow.py:422 DEBUG Task #0 in state TERMINATED
== 2016-05-27 09:31:51,276 __init__.py:1811 DEBUG Calling state-transition 
handler 'terminated' on <gc3libs.workflow.DependentTaskCollection object at 
0x19f95d0> ...
== 2016-05-27 09:31:51,276 __init__.py:169 DEBUG Checking auth: NoneAuth 
(always successful).
== 2016-05-27 09:31:51,276 transport.py:893 DEBUG Opening LocalTransport...
== 2016-05-27 09:31:51,277 pbs.py:328 DEBUG Running `qstat -a`...
== 2016-05-27 09:31:51,332 transport.py:943 DEBUG Executed local command 'qstat 
-a', got exit status: 0
== 2016-05-27 09:31:51,333 pbs.py:338 DEBUG Computing updated values for 
total/available slots ...
..........
== 2016-05-27 09:31:51,628 pbs.py:356 INFO Updated resource 'nehalem' status: 
free slots: -1, total running: 0, own running jobs: 0, own queued jobs: 0, 
total queued jobs: 0
== 2016-05-27 09:31:51,629 core.py:1671 DEBUG Engine.stats: Restricting to 
object of class 'Application'
== 2016-05-27 09:31:51,629 build_log.py:227 INFO GC3Pie job overview: 1 
terminated, 1 failed (total: 1)
== 2016-05-27 09:32:01,639 build_log.py:227 INFO Done processing jobs
== 2016-05-27 09:32:01,640 core.py:1671 DEBUG Engine.stats: Restricting to 
object of class 'Application'
== 2016-05-27 09:32:01,640 build_log.py:227 INFO GC3Pie job overview: 1 
terminated, 1 failed (total: 1)


> tracejob 1076106

/var/spool/torque/server_logs/20160527: No such file or directory
/var/spool/torque/mom_logs/20160527: No such file or directory
/var/spool/torque/sched_logs/20160527: No such file or directory

Job: 1076106.batch.css.lan

05/27/2016 09:31:10  A    queue=all
05/27/2016 09:31:14  A    user=nanava group=zzzz jobname="Bonnie++-1.03e-" 
queue=all ctime=1464334270 qtime=1464334270 etime=1464334270 start=1464334274 
owner=nanava
                          
exec_host=tane-n041/0+tane-n041/1+tane-n041/2+tane-n041/3+tane-n041/4+tane-n041/5+tane-n041/6+tane-n041/7+tane-n041/8+tane-n041/9+tane-n041/10+tane-n041/11
                          Resource_List.mem=1800mb 
Resource_List.neednodes=1:ppn=12 Resource_List.nodect=1 
Resource_List.nodes=1:ppn=12 Resource_List.walltime=24:00:00
05/27/2016 09:31:39  A    user=nanava group=nanava jobname="Bonnie++-1.03e-" 
queue=all ctime=1464334270 qtime=1464334270 etime=1464334270 start=1464334274 
owner=nanava
                          
exec_host=tane-n041/0+tane-n041/1+tane-n041/2+tane-n041/3+tane-n041/4+tane-n041/5+tane-n041/6+tane-n041/7+tane-n041/8+tane-n041/9+tane-n041/10+tane-n041/11
                          Resource_List.mem=1800mb 
Resource_List.neednodes=1:ppn=12 Resource_List.nodect=1 
Resource_List.nodes=1:ppn=12 Resource_List.walltime=24:00:00 session=1213
                          total_execution_slots=12 unique_node_count=1 
end=1464334299 Exit_status=0 resources_used.cput=00:00:10 
resources_used.mem=34836kb resources_used.vmem=224008kb
                          resources_used.walltime=00:00:26


Reply via email to