Unfortunately, I don't have a complete answer to your question, but I can
provide some suggestions and information that may help. The next 3 paragraphs
are an attempt at practical help, and the rest musings and potential
theoretical help specifically concerning the last idea in your email. Anyway,
My first thought is to dig into the implementation of the "Get Data -> Upload
File" tool (tool_id=upload1), specifically to examine how it handles composite
datasets. I think the parameters (like multiple file uploads or setting
metadata values) are automatically generated based on the datatype's
"MetadataElement"s. In particular, see exactly how the "set_in_upload"
argument to MetadataElement works. I haven't had time to dig in detail into
how the tool interface is created in that case, so I can't promise that an
answer is there, though I think that is very likely. I also think it's likely
that if there is an answer, it may be non-trivial and/or messy to generalize it
to your case.
My second thought is that you will need to add new tool config tags. I've been
looking into how to add a couple of my own to allow the interactive behavior of
a tool's form to be more dynamic, in a controlled way. So far, I've identified
the following areas that I would have to modify to implement new tags:
- lib/galaxy/tools/__init__.py : update_state, parse_input_elem,
- lib/galaxy/tools/actions/__init__.py : DefaultToolAction : execute,
- lib/galaxy/tools/parameters/__init__.py : visit_input_values
- lib/galaxy/tools/parameters/ : Which of the files in this directory you
need to modify will depend on your tag
To support testing, also (probably more than):
- lib/galaxy/tools/test.py : ToolTestBuilder
To support workflows also (probably more than):
Hopefully that information is of some use, at least if you're looking for a
place to start.
The Rgenetics / Rexpression tools may also be worth examining, as they use
metadata a fair bit, though not quite in the way you've described. And, the
Finally, I'm intrigued by your idea of generating a tool definition file
on-the-fly. JIT tools, heh. I suppose one way to accomplish this would be to
have a primary tool that uses the conventional mechanisms to take just enough
information (like the first datasets of whose metadata your secondary tool
would be a function) to bootstrap and generate the secondary tool as a function
of the metadata. The primary tool could then trigger Galaxy to load the
secondary tool and (optimally) transparently redirect the user's browser to
that tool. Obviously, this approach could be iterated if necessary.
This is just an idea though. Implementing it would be more difficult than it
sounds, because you'd have to find a way to get your generated tool into
Galaxy's "toolbox" in the first place. Each invocation of the primary tool
would have to produce a secondary tool with a different path and tool_id, in
order to avoid race conditions when two users run the primary tool at once.
Even if that is solved satisfactorily, there is still a potential race
condition and/or scaling issue. The ToolBox is a single entity, global to the
Galaxy instance, so there may be a race condition on addition / removal of
secondary tools. Perhaps this is taken care of by the ORM or some other part
of the existing design (I don't know enough about the ToolBox's implementation
here), but even with concurrency-safe ToolBox operations, there may be a
scaling issue. After all, it is accessed pretty frequently.
Next, there are the related issues of whether and how to ''clean up" these
generated tools once they've been run, and how to prevent them from cluttering
up the global toolbox namespace for the whole Galaxy instance. Is there any
kind of permissions mechanism for tools (like there is for libraries, for
instance) that could be used to prevent each user's generated tools from
appearing in each other user's "Tools" menu? Perhaps that could be written.
At first glance, I imagine it would be best for the generated tools to be "use
once and throw away" and private to the user who ran the primary tool or simply
not accessible directly to any user except via the primary tool's one-time
Working with autogenerated tools, you'd also have to be very precise and
careful about versioning the primary tool and all of its dependencies, whether
data, library, or executable. Otherwise, standard debugging as well as (in
particular, reported) bug triage will come over time to be somewhere between a
huge pain and completely infeasible.
In the end, the JIT tool approach is probably going to be a lot more difficult
and a lot more work than just augmenting Galaxy internals to provide the
features you're looking for. On the other hand, I expect that such a
modification of Galaxy's core code would have to be extensive and involve
central / foundational code, thereby dramatically raising the likelihood of
difficulty integrating Galaxy updates in the future. The JIT approach may be a
bit more respectful of the Galaxy core, though by just how much depends on how
invasively you may need to modify the ToolBox to support online adding /
removing of tools and internal control of user-based tool permissions. Some of
this may already be in the works to support ToolShed. Imho, the JIT approach
is inherently cooler, even if potentially more challenging to get right.
Best of luck, and let us know it goes :)
From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu]
on behalf of Rodriguez, Aaron (NIH/NCI) [C] [rodrigue...@mail.nih.gov]
Sent: Thursday, November 17, 2011 10:32 AM
Subject: [galaxy-dev] Dynamic tool configuration
I'm looking to add a tool that works with a custom datatype that would
dynamically generate input parameter options based on the dataset metadata.
A dataset of type foo contains metadata as follows:
descfields = ['label','description']
quantfields = ['qualityscore','othernumericvalue']
These values are parsed directly out of the dataset and stored into the
metadata via the foo datatype class. However the number of values within the
list could vary among datasets of type foo.
Now I'd like to configure a tool that generates input parameter for each of the
descfields values in the list as well as for each of the quantfields values in
I understand that this may be outside of the scope of the current tool syntax
but if anyone could provide some direction to how tools can be made more
'dynamic' using their metadata it would be greatly appreciated. One idea was
to dynamically generate the <tool>.xml and dynamically loading it upon request.
But not sure if this would integrate well.
Thanks for your feedback!
Please keep all replies on the list by using "reply all"
in your mail client. To manage your subscriptions to this
and other Galaxy lists, please use the interface at: