I'd like to open up some discussion about incorporating some code bits into the
My code is here:
First off, I'd like to say that these changes were made initially as hacks to
get Galaxy working with a grid interface for our nefarious purposes. For us,
the results have been spiffy, in that we can offload a bunch of Blast work off
of our own clusters and onto the grid, which processes them fast on a
distributed set of computers.
In order to do this, I wanted to be able to take as much control over the
process as I could. The destination uses Condor, but it used condor_dag to
submit jobs - that means I would have to modify the condor job runner.
The destination needed to have the files shipped over to it first - so I had to
be able to stage. This made lwr attractive, but then I would need to guarantee
that the server at the other end was running lwr, and since I don't have
control of that server, this seemed less likely to be a good option.
The easiest thing for me to understand was the cli runner. I could do ssh, I
could do scp, so this seemed the best place to start. So I started by trying to
figure out which files needed to be sent to the server, and then implementing a
way to send them. I start with stdout, stderr and exit code files. I also want
to stage any datasets that are in the param_dict, and anything that is in
extra_files_path. Then we alter the command line that is run such that all the
paths make sense on the remote server, and to make sure that the right things
are run remotely vs. locally (i.e., metadata.sh is run locally after job is
done). Right now, this is done by splitting the command line on a specific
string, which is not robust for future changes to the command_factory, but I'm
open to suggestions.
So, here's one hack. The hidden data tool parameter is something I hijacked -
as far as I can tell, hidden data is only used for Cufflinks, so it seemed
safe. I use it to send the shell script that will be run on the server (but NOT
sent to the worker nodes). It needed to be a DATA type so that my stager would
pick it up and send it over. I wanted it to be hidden because it was only used
by the tool and it should not need to be an HDA. I made changes to allow the
value of the hidden data to be set in the tool - this would become the
false_path of the data, which would then become its actual path.
Please have a look, and ask questions, and if there are improvements needed
before anything is considered for pulling, let me know. I'd like to present
this at the Galaxy conference without having vegetables thrown at me. Thanks!
National Center for Genome Analysis Support
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
To search Galaxy mailing lists use the unified search at: