Hi all,

I think what we have are two similar, but somewhat separate problems:
1.) We need a way via the UI for an admin to be able to add additional 
configuration entries to data tables / .loc files.
2.) We need a way to bootstrap/initialize a Galaxy installation with data 
table/ .loc file entries ('built-in data') during installation for 
        a.) a 'production' Galaxy instance - this would include local 
dev/testing/etc instances
        b.) automated testing framework - tests should run fast, but 
meaningfully test a tool, e.g., the horse mitochondrial genome could be a fine 
built-in genome for running automated tool tests, but not desired to be 
automatically installed into a production Galaxy instance

For 1.), we now have Data Managers. A Data Manager will do all the heavy 
lifting of adding additional data table entries. e.g. for bwa, it can build the 
mapping indexes and add the properly delimited line to the .loc file. These are 
accessed through the admin interface, under Manage local data. Data Managers 
are installed from a ToolShed, or can be installed manually. In addition to 
direct interactive usage, Data Manager tools can be included in workflows or 
accessed via the tools API. Not only does the use of a Data Manager remove the 
technical burdens/concerns of adding new entries to a data table / .loc file, 
it also provides for the same reproducibility and provenance tracking that is 
afforded to regular Galaxy tools. The documentation for Data Managers is 
currently limited to the tutorial-style doc here: 
http://wiki.galaxyproject.org/Admin/Tools/DataManagers/HowTo/Define; a more 
formal / config syntax type of page will also be made available, although the 
tutorial is a !
 pretty inclusive description of the steps needed to define a Data Manager.

For 2.): bootstrapping data during an installation process is something that 
still needs to be more completely spec'd out and implemented. This 
bootstrapping process should be able to make use of the Data Managers or 
download/move/utilize pre-built configurations. (A Data Manager itself can have 
its underlying actions being a downloading process, e.g. the fetch genomes data 

Lets start by considering the Users' point of view. We have 2 types of users: 
GalaxyAdmin and ToolDev and use a BWA tool as an example.
        Clicks buttons to install tool suite that includes the BWA tool and a 
BWA indexer Data Manager. (so far there is no change from how it works now)
        The Galaxy installer methodology recognizes that it is possible to add 
built-in data:
                Some preassembled mapping indexes are available (pre-built 
                Mapping indexes can be created for any entry in the all_fasta 
data table.
        The User clicks checkboxes/multiple selects for preassembled data to 
download and also selects the fasta entries to be indexed with the Data Manager 

        In ToolShed repository, needs to provide a description of a and b; for 
simplicity we can assume b is a subset of a, but with a different 
attribute/flag (e.g. test_only, 'real', both) or perhaps a different filename; 
abstractly, they are the same thing just run at different times with the 
testing ones not requiring user interaction/selection.

So, the real question becomes, what does this description look like? It is 
probably an XML file, for now lets call it  '__data_table_bootstrap__.xml' 
(alternatively, we can roll it directly into the existing data_manager_conf.xml 
files in the toolshed, although for a list of static downloads, we don't need 
an actual data manager tool). It could look something like this (quick and 
dirty pass, elements and values are made up):

                <data_table name="bwa_index" production="True" testing="False"> 
<!--both would default to True -->
                         <data_manager id="bwa_indexer">
name="list_of_fasta_files_from_all_fasta"/> <!-- corresponds to the all_fasta 
parameter value in the data manager, cycles over each of the fasta values to 
provide selections -->
                        <download code="script_for_prebuilt.py"> <!-- could be 
static list or some dynamically determined listing />
                                <available method="get_list" /> <!-- returns 
sets of parameter values for available data to download-->
                                <fetch method="get_indexes" /> <!-- takes the 
values selected from above available, returns list of URI source and relative 
                        <download > <!-- could be static list or some 
dynamically determined listing />
                                        <field name="dbkey" value="hg19"/>
                                        <field name="description" value="Human 
- hg19"/>

Any thoughts?



On Oct 8, 2013, at 4:26 PM, Guest, Simon wrote:

>> I look forward to some more details from Dan on *.loc
>> file setup.
> Hi Peter, Dan and all,
> What a timely discussion!  I am just in the process of setting up loc files 
> for some new indexes I have created (bowtie2, etc), and would really like to 
> see this automated.
> I see there is a Galaxy script scripts/loc_files/create_all_fasta_loc.py, 
> which is quite sophisticated, and does this job nicely for all_fasta.loc.  
> I'm feeling an urge to somehow extend this script to cope with other 
> datatypes besides fasta, but am wondering if this will be wasted effort if 
> there will soon be a better way to handle this.
> Can Dan or anyone else comment on this?
> cheers,
> Simon

On Oct 8, 2013, at 11:25 AM, Peter Cock wrote:

> On Tue, Oct 8, 2013 at 4:13 PM, Greg Von Kuster <g...@bx.psu.edu> wrote:
>>>> I don't agree with this - the sample files should be used as guidance for
>>>> the admin to create functionally correct .loc files.  This is the same
>>>> aopproach used for all Galaxy .sample files ( e.g., 
>>>> universe_wsgi.ini.sample
>>>> <-> universe_wsgi.ini, etc )
>>> Why then does the tool_conf.xml.sample file get used by the
>>> test framework then? This is a clear example of *.xml.sample
>>> being used in the test framework over the 'real' file *.xml.
>>> I really don't understand this design choice - I would use
>>> tool_conf.xml (it lists the tools actually installed on our Galaxy,
>>> and therefore the things worth testing) while by default
>>> tool_conf.xml.sample includes a whole load of things where
>>> the binaries etc are missing and so the tests will fail (hiding
>>> potential real failures in the noise).
>> I'm not quite sure of the reason for htis as I didn't make this
>> design choice - I'm sure "ancient Galaxy history" plays a role
>> in this decision.
> Probably ;)
>>> Perhaps rather than overloading *.loc.sample with two roles
>>> (sample configuration/documentation and unit tests), we
>>> need to introduce *.loc.test for functional testing purposes?
>> I'm hoping we don''t have to go this route as we have so many
>> priorities.  If you would like this implemented though, please
>> add a new Trello card and we'll consider it.
> Filed: 
> https://trello.com/c/P90b5Pa0/1165-functional-tests-need-separate-loc-files-to-the-live-production-loc-files-e-g-loc-test
>>> That still leaves open the question of how best to install
>>> the test databases or files that the *.loc.test file would
>>> point at for running functional tests.
>> Yes!
> I look forward to some more details from Dan on *.loc
> file setup.
> Thank you,
> Peter

Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

To search Galaxy mailing lists use the unified search at:

Reply via email to