Hi Nate,

thank you for your answers and your speed!

Cheers,
Cyril

On 01/10/2013 06:26 PM, Nate Coraor wrote:
On Jan 10, 2013, at 4:43 AM, MONJEAUD wrote:

Hi,

indeed, you are right. In the database, the "job_runner_external_id" column is 
empty for all jobs causing the crash of Galaxy when they are stopped. I tried to launch 
the instance without the --daemon option and I have got this segmentation fault as you 
suspected:

run.sh: line 77: 19622 Segmentation fault      (core dumped) 
/local/python/2.7-bis/bin/python ./scripts/paster.py serve universe_wsgi.ini
If I understand, we can't delete a job with the state "new" (associated with an empty 
"job_runner_external_id" column)?
Hi Cyril,

This was due to a bug, which has been fixed in c015b82b3944.

--nate

Thanks,
Cyril



On 01/09/2013 07:51 PM, Nate Coraor wrote:
Hi Cyril,

If you start the server in the foreground (no --daemon option), is there a 
segfault when the process dies?  If so, this is most likely a problem where a 
job is attempting to be stopped that does not have an external job ID set.  
Could you check this in the database for one of the jobs that's causing this 
(e.g. 3065)?

Thanks,
--nate

On Jan 9, 2013, at 4:39 AM, MONJEAUD wrote:

Hello All,

after more researchs, I found that the crash of the galaxy server was caused by 
stopping jobs. We are working with our own SGE cluster.

It's weird because we can kill jobs via history or administration panel without 
problem.

In the paster.log, we just got this message before the crash of the server :
galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 Stopping job 3065:
galaxy.jobs.handler DEBUG 2013-01-08 16:52:39,877 stopping job 3065 in drmaa 
runner
I think this problem comes when there is many jobs in "running", "new" and 
"queued" states.

Cheers,
Cyril


On 01/08/2013 04:11 PM, MONJEAUD wrote:
Hello All,

I'm trying to deploy my instance of Galaxy in production. Some tests we've done 
show that when the number of person connected is high (>20 together), the 
server stops itself.

Sometimes, I have this error in the paster.log:

Exception happened during processing of request from ('127.0.0.1', 60575)
Traceback (most recent call last):
  File "/opt/galaxy-dist/eggs/Paste-1.6-py2.7.egg/paste/httpserver.py", line 
1053, in process_request_in_thread
    self.finish_request(request, client_address)
  File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 323, in 
finish_request
    self.RequestHandlerClass(request, client_address, self)
  File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 641, in 
__init__
    self.finish()
  File "/local/python/2.7-bis/lib/python2.7/SocketServer.py", line 694, in 
finish
    self.wfile.flush()
  File "/local/python/2.7-bis/lib/python2.7/socket.py", line 301, in flush
    self._sock.sendall(view[write_offset:write_offset+buffer_size])
error: [Errno 32] Broken pipe
Do you have any ideas about this and how resolve it?

Cheers!!
Cyril

--

Cyril Monjeaud
Equipe Symbiose / Plate-forme GenOuest
Bureau D156
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 74 17

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

http://lists.bx.psu.edu/


--

Cyril Monjeaud
Equipe Symbiose / Plate-forme GenOuest
Bureau D156
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 74 17



--

Cyril Monjeaud
Equipe Symbiose / Plate-forme GenOuest
Bureau D156
IRISA-INRIA, Campus de Beaulieu
35042 Rennes cedex, France
Tél: +33 (0) 2 99 84 74 17

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to