Re: [galaxy-dev] Recommended Specs for Production System

2011-04-11 Thread Glen Beane

On Apr 8, 2011, at 11:01 AM, Dave Walton wrote:

 This is very close to our config, except -
 We run all of this on a 4 core Virtual Machine running SUSE Linux Enterprise
 Server 11 (x86_64) with 16 GB of memory.
 
 Instead of SGE our HPC cluster uses Torque/Moab for scheduling.
 
 Also, we've set up a separate IO Node for upload of data files from the file
 system and FTP (correct me if I mis-spoke Glen).
 
 Also, instead of apache we run nginx for our httpd server as it was easy to
 get off-loading of file upload and download working with that server.
 
 We're not seeing a heavy load from users at this point, but this has worked
 pretty well for us so far.
 
 Hope this helps,
 
 Dave
 


The only reason we offload the upload jobs somewhere other than our HPC cluster 
is that our cluster nodes do not see the outside world.  Our IT folks did not 
really want to change the network configuration, so we installed TORQUE on a 
spare Linux server, mounted our galaxy network storage on it,  and we setup 
some upload specific job runners that send those jobs to that node.  If you 
have NAT setup on your cluster you probably don't need to worry about that.

We have pretty fat cluster nodes (128GB RAM and 32 cores) since we run a lot 
of  multi-threaded jobs on the cluster but not a lot of MPI jobs.  Our NGS 
tools are typically configured to use 16-32 threads.




 
 On 4/8/11 10:21 AM, Assaf Gordon gor...@cshl.edu wrote:
 
 Assaf Gordon wrote, On 04/08/2011 10:07 AM:
 Processes:
 
 The servers processes that you should plan for are:
 1 galaxy process for job-runner
 2 or 3 galaxy processes for web-fronts
 1 process of postgres
 1 process of apache
 optionally 1 process of galaxy-reports
 you'll also want to leave some free CPUs for SSH access, CRON jobs and other
 peripherals.
 Postgres  apache are multithreaded, but it usually balances out with light
 load on the web/DB front from galaxy (even with 30 users).
 So all in all, I'd recommend reserving 5 to 8 CPU cores to just galaxy and
 daemons (reserving means: never using those cores for galaxy jobs).
 You can do with less cores, but then response times might suffer (and it's
 annoying when you click show saved histories and the page takes 20 seconds
 to load...).
 
 Forgot to mention SGE/PBS: you definitely want to use them (even if you're
 using a single machine),
 because the local job runner doesn't take into account multi-threaded 
 programs
 when scheduling jobs.
 So another core is needed for the SGE scheduler daemons (sge_qmaster and
 sge_execd).
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
  http://lists.bx.psu.edu/
 

--
Glen L. Beane
Senior Software Engineer
The Jackson Laboratory
(207) 288-6153





___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Hans-Rudolf Hotz



On 04/07/2011 11:40 PM, Ryan Golhar wrote:

Hi all - So, I been asked to provide specs for a production Galaxy
system to support approximately 20-30 users. Most of these users are new
to bioinformatics and very new to NGS. I'm targeting a user base that
will use a light to moderate amount of NGS data.

I've looked at the the Produce Server Wiki page stuff, but I'm curious
what everyone else is using or recommends? How big of a compute cluster,
how much storage, proxy/web server configurations, etc, etc.

If you had to deploy a production system, based on what you know, what
would you choose?



Hi Ryan


I would go for a single (multicore) box. With just 20-30 users who are 
'new to bioinformatics' you will hardly ever have more than 3 users 
using Galaxy at the same time - you can always limit the number of 
concurrent galaxy jobs in the universe_wsgi.ini file 
('local_job_queue_workers').


Since you are expecting NGS data, having the right amount of RAM would 
be my biggest concern. What do you mean by light to moderate amount of 
NGS data? are you talking about the number of samples to process or are 
you talking about the individual size of the sample. The latter will 
have an impact on the required amount of RAM. On the other hand both 
will have an impact on the amount of storage required.


You have to make the calculations for required storage and RAM first, 
but this is independent of whether you use Galaxy or not. The only risk 
when offering NGS tools via galaxy it might be to easy to run them 
resulting in a lot of 'garbage' or redundant NGS processing. That's why 
it is important to disable anonymous access so you can track who is 
doing what.


Using external authentication is very handy. However, it does restrict 
you to users already in you 'network'. We are using it, and it is 
sometimes annoying, as I can't have temporary guest accounts - our IT 
guys would have to create a new 'member' of our institute for every 
guest



Hope this helps, Hans



Ryan



___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

   http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Assaf Gordon
Assaf Gordon wrote, On 04/08/2011 10:07 AM:
 Processes:
 
 The servers processes that you should plan for are:
 1 galaxy process for job-runner
 2 or 3 galaxy processes for web-fronts
 1 process of postgres
 1 process of apache
 optionally 1 process of galaxy-reports
 you'll also want to leave some free CPUs for SSH access, CRON jobs and other 
 peripherals.
 Postgres  apache are multithreaded, but it usually balances out with light 
 load on the web/DB front from galaxy (even with 30 users).
 So all in all, I'd recommend reserving 5 to 8 CPU cores to just galaxy and 
 daemons (reserving means: never using those cores for galaxy jobs).
 You can do with less cores, but then response times might suffer (and it's 
 annoying when you click show saved histories and the page takes 20 seconds 
 to load...).
 
Forgot to mention SGE/PBS: you definitely want to use them (even if you're 
using a single machine),
because the local job runner doesn't take into account multi-threaded programs 
when scheduling jobs.
So another core is needed for the SGE scheduler daemons (sge_qmaster and 
sge_execd).


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Nate Coraor
Assaf Gordon wrote:

 Forgot to mention SGE/PBS: you definitely want to use them (even if you're 
 using a single machine),
 because the local job runner doesn't take into account multi-threaded 
 programs when scheduling jobs.
 So another core is needed for the SGE scheduler daemons (sge_qmaster and 
 sge_execd).

I haven't tested, but it's entirely possible that the SGE daemons could
happily share cores with other processes.  I'd be surprised if they
spent a whole lot of time on-CPU.

A cluster runner is recommended for other reasons, too - restartability
of the Galaxy process is one of the big ones.

--nate
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Sean Davis
On Fri, Apr 8, 2011 at 10:26 AM, Nate Coraor n...@bx.psu.edu wrote:
 Assaf Gordon wrote:

 Forgot to mention SGE/PBS: you definitely want to use them (even if you're 
 using a single machine),
 because the local job runner doesn't take into account multi-threaded 
 programs when scheduling jobs.
 So another core is needed for the SGE scheduler daemons (sge_qmaster and 
 sge_execd).

 I haven't tested, but it's entirely possible that the SGE daemons could
 happily share cores with other processes.  I'd be surprised if they
 spent a whole lot of time on-CPU.

We run SGE for NGS and do not find a need to set aside cores for the
daemons.  That said, if you do have an active cluster (more than a
couple of machines), the SGE master node does benefit from having a
core set aside.

Sean

 A cluster runner is recommended for other reasons, too - restartability
 of the Galaxy process is one of the big ones.

 --nate
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Recommended Specs for Production System

2011-04-08 Thread Dave Walton
This is very close to our config, except -
We run all of this on a 4 core Virtual Machine running SUSE Linux Enterprise
Server 11 (x86_64) with 16 GB of memory.

Instead of SGE our HPC cluster uses Torque/Moab for scheduling.

Also, we've set up a separate IO Node for upload of data files from the file
system and FTP (correct me if I mis-spoke Glen).

Also, instead of apache we run nginx for our httpd server as it was easy to
get off-loading of file upload and download working with that server.

We're not seeing a heavy load from users at this point, but this has worked
pretty well for us so far.

Hope this helps,

Dave


On 4/8/11 10:21 AM, Assaf Gordon gor...@cshl.edu wrote:

 Assaf Gordon wrote, On 04/08/2011 10:07 AM:
 Processes:
 
 The servers processes that you should plan for are:
 1 galaxy process for job-runner
 2 or 3 galaxy processes for web-fronts
 1 process of postgres
 1 process of apache
 optionally 1 process of galaxy-reports
 you'll also want to leave some free CPUs for SSH access, CRON jobs and other
 peripherals.
 Postgres  apache are multithreaded, but it usually balances out with light
 load on the web/DB front from galaxy (even with 30 users).
 So all in all, I'd recommend reserving 5 to 8 CPU cores to just galaxy and
 daemons (reserving means: never using those cores for galaxy jobs).
 You can do with less cores, but then response times might suffer (and it's
 annoying when you click show saved histories and the page takes 20 seconds
 to load...).
 
 Forgot to mention SGE/PBS: you definitely want to use them (even if you're
 using a single machine),
 because the local job runner doesn't take into account multi-threaded programs
 when scheduling jobs.
 So another core is needed for the SGE scheduler daemons (sge_qmaster and
 sge_execd).
 
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
 
   http://lists.bx.psu.edu/


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/