Hi Aaron,

Unfortunately, I don't have a complete answer to your question, but I can 
provide some suggestions and information that may help.  The next 3 paragraphs 
are an attempt at practical help, and the rest musings and potential 
theoretical help specifically concerning the last idea in your email.  Anyway, 
the practical...

My first thought is to dig into the implementation of the "Get Data -> Upload 
File" tool (tool_id=upload1), specifically to examine how it handles composite 
datasets.  I think the parameters (like multiple file uploads or setting 
metadata values) are automatically generated based on the datatype's 
"MetadataElement"s.  In particular, see exactly how the "set_in_upload" 
argument to MetadataElement works.  I haven't had time to dig in detail into 
how the tool interface is created in that case, so I can't promise that an 
answer is there, though I think that is very likely.  I also think it's likely 
that if there is an answer, it may be non-trivial and/or messy to generalize it 
to your case.

My second thought is that you will need to add new tool config tags.  I've been 
looking into how to add a couple of my own to allow the interactive behavior of 
a tool's form to be more dynamic, in a controlled way.  So far, I've identified 
the following areas that I would have to modify to implement new tags:
  - lib/galaxy/tools/__init__.py : update_state, parse_input_elem, 
check_and_update_param_values_helper, handle_unvalidated_param_values_helper, 
  - lib/galaxy/tools/actions/__init__.py : DefaultToolAction : execute, 
  - lib/galaxy/tools/parameters/__init__.py : visit_input_values
  - lib/galaxy/tools/parameters/ : Which of the files in this directory you 
need to modify will depend on your tag
To support testing, also (probably more than):
  - lib/galaxy/tools/test.py : ToolTestBuilder
To support workflows also (probably more than):
  - lib/galaxy/workflow/modules.py
Hopefully that information is of some use, at least if you're looking for a 
place to start.

The Rgenetics / Rexpression tools may also be worth examining, as they use 
metadata a fair bit, though not quite in the way you've described.  And, the 

Finally, I'm intrigued by your idea of generating a tool definition file 
on-the-fly.  JIT tools, heh.  I suppose one way to accomplish this would be to 
have a primary tool that uses the conventional mechanisms to take just enough 
information (like the first datasets of whose metadata your secondary tool 
would be a function) to bootstrap and generate the secondary tool as a function 
of the metadata.  The primary tool could then trigger Galaxy to load the 
secondary tool and (optimally) transparently redirect the user's browser to 
that tool.  Obviously, this approach could be iterated if necessary.

This is just an idea though.  Implementing it would be more difficult than it 
sounds, because you'd have to find a way to get your generated tool into 
Galaxy's "toolbox" in the first place.  Each invocation of the primary tool 
would have to produce a secondary tool with a different path and tool_id, in 
order to avoid race conditions when two users run the primary tool at once.  
Even if that is solved satisfactorily, there is still a potential race 
condition and/or scaling issue.  The ToolBox is a single entity, global to the 
Galaxy instance, so there may be a race condition on addition / removal of 
secondary tools.  Perhaps this is taken care of by the ORM or some other part 
of the existing design (I don't know enough about the ToolBox's implementation 
here), but even with concurrency-safe ToolBox operations, there may be a 
scaling issue.  After all, it is accessed pretty frequently.

Next, there are the related issues of whether and how to ''clean up" these 
generated tools once they've been run, and how to prevent them from cluttering 
up the global toolbox namespace for the whole Galaxy instance.  Is there any 
kind of permissions mechanism for tools (like there is for libraries, for 
instance) that could be used to prevent each user's generated tools from 
appearing in each other user's "Tools" menu?  Perhaps that could be written.

At first glance, I imagine it would be best for the generated tools to be "use 
once and throw away" and private to the user who ran the primary tool or simply 
not accessible directly to any user except via the primary tool's one-time 

Working with autogenerated tools, you'd also have to be very precise and 
careful about versioning the primary tool and all of its dependencies, whether 
data, library, or executable.  Otherwise, standard debugging as well as (in 
particular, reported) bug triage will come over time to be somewhere between a 
huge pain and completely infeasible.

In the end, the JIT tool approach is probably going to be a lot more difficult 
and a lot more work than just augmenting Galaxy internals to provide the 
features you're looking for.  On the other hand, I expect that such a 
modification of Galaxy's core code would have to be extensive and involve 
central / foundational code, thereby dramatically raising the likelihood of 
difficulty integrating Galaxy updates in the future.  The JIT approach may be a 
bit more respectful of the Galaxy core, though by just how much depends on how 
invasively you may need to modify the ToolBox to support online adding / 
removing of tools and internal control of user-based tool permissions.  Some of 
this may already be in the works to support ToolShed.  Imho, the JIT approach 
is inherently cooler, even if potentially more challenging to get right.

Best of luck, and let us know it goes :)


From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
on behalf of Rodriguez, Aaron (NIH/NCI) [C] [rodrigue...@mail.nih.gov]
Sent: Thursday, November 17, 2011 10:32 AM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] Dynamic tool configuration


I'm looking to add a tool that works with a custom datatype that would 
dynamically generate input parameter options based on the dataset metadata.

For example,

A dataset of type foo contains metadata as follows:

descfields = ['label','description']
quantfields = ['qualityscore','othernumericvalue']

These values are parsed directly out of the dataset and stored into the 
metadata via the foo datatype class.  However the number of values within the 
list could vary among datasets of type foo.

Now I'd like to configure a tool that generates input parameter for each of the 
descfields values in the list as well as for each of the quantfields values in 
its list.

I understand that this may be outside of the scope of the current tool syntax 
but if anyone could provide some direction to how tools can be made more 
'dynamic' using their metadata it would be greatly appreciated.  One idea was 
to dynamically generate the <tool>.xml and dynamically loading it upon request. 
 But not sure if this would integrate well.

Thanks for your feedback!


