Hi John, thanks for the reply.
Yes, I mean Galaxy's default behavior of keeping all the data on all nodes
of our condor cluster. So for instance if I run a job, then the output of
that job is copied to every node in the cluster. Is this not the normal
On Tue, Dec 17, 2013 at 9:42 AM, John Chilton <chil...@msi.umn.edu> wrote:
> Hey Ben,
> Thanks for the e-mail. I did not promise anything was coming soon, I
> only said people were working on parts of it. It is not a feature yet
> unfortunately - multiple people including myself are thinking about
> various parts of this problem though.
> I would like to respond, but I am trying to understand this line: "We
> can't do this because Galaxy copies all intermediate steps to all
> no(d)es, which would bog down the servers too much."
> Can you describe how you are doing this staging for me? Is data
> currently being copied around to all the nodes, if so how are you
> doing that? Or are you trying to say that Galaxy requires the data to
> be available on all of the nodes?
> On Tue, Dec 17, 2013 at 11:15 AM, Ben Gift <corn8b...@gmail.com> wrote:
> > We've run into a scenario lately where we need to run a very large
> > (huge data in intermediate steps) many times. We can't do this because
> > Galaxy copies all intermediate steps to all notes, which would bog down
> > servers too much.
> > I asked about something similar before and John mentioned the feature to
> > automatically delete intermediate step data in a workflow once it
> > was coming soon. Is that a feature now? That would help.
> > Ultimately though we can't be copying all this data around to all nodes.
> > network just isn't good enough, so I have an idea.
> > What if we have an option on the 'run workflow' screen to only run on one
> > node (eliminating the neat Galaxy concurrency ability for that workflow
> > unfortunately)? Then it just propagates the final step data.
> > Or maybe only copy to a couple other nodes, to keep concurrency.
> > If the job errored then in this case I think it should just throw out all
> > the data, or propagate where it stopped.
> > I've been trying to work on implementing this myself but it's taking me a
> > long time. I only just started understanding the pyramid stack, and am
> > putting in the checkbox in the run.mako template. I still need to learn
> > database schema, message passing, and how jobs are stored, and how to
> > condor to only use 1 node, (and more I'm sure) in Galaxy. (I'm drowning)
> > This seems like a really important feature though as Galaxy gains more
> > traction as a research tool for bigger projects that demand working with
> > huge data, and running huge workflows many many times.
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client. To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> > http://lists.bx.psu.edu/
> > To search Galaxy mailing lists use the unified search at:
> > http://galaxyproject.org/search/mailinglists/
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: