I have never setup a Galaxy instance fronting multiple clusters, but it's
something I would like to explore. I have a dedicated cluster to run Galaxy
jobs and I've got another a shared cluster to which I hope Galaxy can assign
jobs when the dedicated cluster is too busy.
>From my understanding of Galaxy, the tool runner for each tool is hardcoded
in the universe.ini file and if you do not configure a tool runner for a
tool, Galaxy will use the default tool runner, which is determined by the
default_cluster_job_runner parameter. I believe you can configure multiple
job runners for a specific tool under the [galaxy:tool_runners] in
universe.ini, for instance, for each cluster, you have a different tool
runner for a specific tool, however, Galaxy probably will just use one of
them, most likely the last one. So cluster selection for a tool is
determined by the job runner, which is hard coded in the universe.ini file.
As a result, the running of a workflow is determined by the tools in the
workflow. If each tool in the workflow is configured to use the same
cluster, then the workflow is run on the same cluster, otherwise, it will
span multiple clusters.
I think if you can configure the machine that runs Galaxy instance to be the
submit host of multiple clusters, then it's possible to have Galaxy front
multiple clusters. For me, the biggest hurdle is how to let two clusters
having a shared storage space and configure a machine in one cluster to be
the submit host of another cluster.
On Mon, Sep 19, 2011 at 9:44 AM, Ann Black <annbl...@eng.uiowa.edu> wrote:
> Hello -
> I am working on standing up our own galaxy installation. We would like to
> have galaxy front multiple clusters, and I have some questions I was hoping
> someone could help with.
> 1) From reading other forum posts on this subject, it seems I need to
> minimally do the following ... is this correct?:
> A) have galaxy server w/ sge register as a job submitting host to the
> head node of each cluster
> B) Configure multiple tool runners for each tool per remote cluster?
> 2) When galaxy would submit a job, how would a backend remote cluster be
> selected? When running workflows, would the same cluster be used to run the
> entire workflow - or could the workflow then span remote clusters?
> 3) I am trying to understand some of the source code, where is the logic
> that would then dispatch the job and select a job runner to use?
> 4) Other advice or steps needed in order to get galaxy to front multiple
> remote clusters?
> Thanks so much,
> Please keep all replies on the list by using "reply all"
> in your mail client. To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: