On Dec 14, 2011, at 6:13 PM, David Matthews wrote:

> Hi Guys,
> 
> Sorry to be a pain but this seems to be getting worse for us. Here are the 
> latest tracebacks - any suggestions would be gratefully received!!

Hi David,

As the MemoryError indicates, the Galaxy process is running out of memory.  
debug = False is preferable, actually.  I asked because having debug = True 
could easily result in the behavior you're seeing.

The pbs code definitely has a memory leak, I believe within libtorque or 
pbs_python.  Because of this, I restart my job runner process when it reaches a 
certain amount of memory usage.  However, this may not be the cause of your 
errors.  To figure it out, we'll need to know exactly which thread is consuming 
the memory.  You may want to enable the heartbeat log and look there to see 
which threads are active.

The question about the path was in reference to whether these errors occur 
immediately upon running a tophat job, without any interaction, or if they 
occur when you try to click to view the job's output, or on some other part of 
the Galaxy interface.

Thanks,
--nate

> 
> Cheers
> David
> 
> 
> 
>> galaxy.jobs.runners.pbs ERROR 2011-12-13 19:57:57,689 Uncaught exception 
>> checking jobs
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py",
>>  line 338, in monitor
>>   self.check_watched_items()
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py",
>>  line 351, in check_watched_items
>>   ( failures, statuses ) = self.check_all_jobs()
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py",
>>  line 462, in check_all_jobs
>>   statuses.update( self.convert_statjob_to_bunches( jobs ) )
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/lib/galaxy/jobs/runners/pbs.py",
>>  line 476, in convert_statjob_to_bunches
>>   statuses[ job.name ] = Bunch( **status )
>> MemoryError
>> Unhandled exception in thread started by
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 580, in __bootstrap_inner
>> MemoryError
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(Thread-11, stopped 1111390528)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>> MemoryError
>> Unexpected exception in worker <function <lambda> at 0x883acf8>
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 863, in worker_thread_callback
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1037, in <lambda>
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1056, in process_request_in_thread
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1044, in handle_error
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 
>> 334, in handle_error
>> MemoryError
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(Thread-10, stopped 1109289280)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>> MemoryError
>> ----------------------------------------
>> Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 
>> 44389)
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1053, in process_request_in_thread
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 
>> 322, in finish_request
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 
>> 616, in __init__
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 
>> 657, in setup
>> MemoryError
>> ----------------------------------------
>> ----------------------------------------
>> Exception happened during processing of request from ('xxx.xxx.xx.xx', 60069)
>> Unexpected exception in worker <function <lambda> at 0x883a2a8>Traceback 
>> (most recent call last):
>> 
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1053, in process_request_in_thread
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(worker 9, stopped 1130301760)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>> MemoryError  File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/SocketServer.py", line 322, 
>> in finish_request
>> 
>> Unexpected exception in worker <function <lambda> at 0x8721410>
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 863, in worker_thread_callback
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(worker 0, stopped 1086265664)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 242, in format_exc
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 142, in format_exception
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, 
>> in format_tb
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 101, in extract_tb
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, 
>> in getline
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, 
>> in getlines
>> MemoryError
>> ----------------------------------------
>> Exception happened during processing of request from ('xxx.xxx.xx.xx', 60071)
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1053, in process_request_in_thread
>> Unexpected exception in worker <function <lambda> at 0x8721410>
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 863, in worker_thread_callback
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(worker 6, stopped 1123998016)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>>   self.__bootstrap_inner()
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>>   (self.name, _format_exc()))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 242, in format_exc
>>   return ''.join(format_exception(etype, value, tb, limit))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 142, in format_exception
>>   list = list + format_tb(tb, limit)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, 
>> in format_tb
>>   return format_list(extract_tb(tb, limit))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 101, in extract_tb
>>   line = linecache.getline(filename, lineno, f.f_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, 
>> in getline
>>   lines = getlines(filename, module_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, 
>> in getlines
>>   return updatecache(filename, module_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 
>> 131, in updatecache
>>   lines = fp.readlines()
>> MemoryError
>> ----------------------------------------
>> Exception happened during processing of request from ('xxx.xxx.xxx.xxx', 
>> 44416)
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 1053, in process_request_in_thread
>> Unexpected exception in worker <function <lambda> at 0x8721410>
>> Traceback (most recent call last):
>> File 
>> "/gpfs/cluster/isys/galaxy/Galaxy/galaxy-dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
>>  line 863, in worker_thread_callback
>> Unhandled exception in thread started by <bound method Thread.__bootstrap of 
>> <Thread(worker 7, stopped 1126099264)>>
>> Traceback (most recent call last):
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 504, in __bootstrap
>>   self.__bootstrap_inner()
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/threading.py", line 
>> 545, in __bootstrap_inner
>>   (self.name, _format_exc()))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 242, in format_exc
>>   return ''.join(format_exception(etype, value, tb, limit))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 142, in format_exception
>>   list = list + format_tb(tb, limit)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 76, 
>> in format_tb
>>   return format_list(extract_tb(tb, limit))
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/traceback.py", line 
>> 101, in extract_tb
>>   line = linecache.getline(filename, lineno, f.f_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 14, 
>> in getline
>>   lines = getlines(filename, module_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 40, 
>> in getlines
>>   return updatecache(filename, module_globals)
>> File "/gpfs/cluster/isys/galaxy/Galaxy/lib/python2.6/linecache.py", line 
>> 131, in updatecache
>>   lines = fp.readlines()
>> MemoryError
>> ----------------------------------------
>> 
>> -- 
>> -----------------------------------------------------------
>> Callum Wright                                
>> HPC Systems Administrator            
>> High Performance Computing
>> University of Bristol
>> 
>> Phone:          0117 331 4429
>> email:          c.wri...@bristol.ac.uk
>> web:            www.acrc.bristol.ac.uk
>> -----------------------------------------------------------
>> 
> 
> 


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to