Hi Peter,

Please see replies inline, below.


Thanks,

Dan


On Oct 17, 2013, at 5:36 AM, Peter Cock wrote:

> Hi Dan,
> 
> On Tue, Oct 15, 2013 at 7:40 PM, Daniel Blankenberg <d...@bx.psu.edu> wrote:
>> Hi all,
>> 
>> I think what we have are two similar, but somewhat separate problems:
>> 1.) We need a way via the UI for an admin to be able to add additional
>> configuration entries to data tables / .loc files.
>> 
>> For 1.), we now have Data Managers. A Data Manager will do all the
>> heavy lifting of adding additional data table entries. e.g. for bwa, it can
>> build the mapping indexes and add the properly delimited line to the
>> .loc file. These are accessed through the admin interface, under Manage
>> local data. Data Managers are installed from a ToolShed, or can be
>> installed manually. In addition to direct interactive usage, Data Manager
>> tools can be included in workflows or accessed via the tools API. Not
>> only does the use of a Data Manager remove the technical burdens/
>> concerns of adding new entries to a data table / .loc file, it also provides
>> for the same reproducibility and provenance tracking that is afforded
>> to regular Galaxy tools.
> 
> You said there Data Managers can be used within a workflow.
> I don't quite follow - aren't the Data Managers restricted to
> administrators only?

This is correct. Admins can run workflows containing Data Managers, while 
standard users cannot. Additionally, the selection list for any installed Data 
Managers will only appear within the workflow editor for an admin.



> If you don't mind me picking two specific examples of direct
> personal interest - which lead me to ask if there a default
> Data Manager which just offers a web GUI for editing any *.loc
> file as a table?

Something like this for adding entries could be done now, although currently 
existing entries cannot be modified or removed by using Data Managers. There is 
not currently a generic Data Manager written that will do this though. 

On my list of things to do is to write a Data Manager that would generically 
make use of our datacache rsync server, but there is not an ETA for this. 
Another one, or the same one, could also make use of S3, which would be 
particularly useful for Cloud instances.


> --
> 
> Blast2GO - http://toolshed.g2.bx.psu.edu/view/peterjc/blast2go
> This tool wrapper uses blast2go.loc which should list one or more
> Blast2G) *.properties files. These can in principle be used for
> advanced things like changing evidence weighting codes etc.
> However, the primary point is to point to different Blast2GO
> databases.
> 
> There have been a series of (date stamped) public (free) Blast2GO
> databases, and my tool installation script already sets up the
> *.properties files for the most recent databases (which it uses
> for a unit test), which was your point 2 (below).
> 
> The local Galaxy administrator may need to add extra entries
> to the blast2go.loc file, for instance when there is a new public
> database release, or if they setup a local database (recommended).
> 
> This seems to be an easy case (since there is little that we can
> automate). A simple interface for adding lines to the *.loc files
> would be enough, assuming it includes a file select browser.

In this case, you could define a blast2go Data Manager that would be able to 
allow the selection of the external public (free) Blast2GO that the user wants. 
A code file could be used to populate this list dynamically from the external 
server's contents until a more generalized way of doing so is made available to 
tool parameters. The underlying Data Manager tool would then retrieve the 
database and return a JSON description of the fields to add to the data table 
.loc file.

This same Data Manager could be allowed to add a file locally from a server's 
filesystem. We don't have a filesystem select widget for tools yet, but you 
could use a textbox for manual entry or use a select list/drill down with 
dynamic code for this. A ServerFileToolParameter could be defined to list 
server contents directly, but we would want to make sure that ordinary tool 
devs are aware of it being a bit of security risk, depending upon how it is 
used (don't want ordinary users, selecting random files off of the filesystem 
in normal tools, usually).

It may be worthwhile to have a look at the Reference Genome / all_fasta data 
manager 
(http://testtoolshed.g2.bx.psu.edu/view/blankenberg/data_manager_fetch_genome_all_fasta),
 which can grab reference genome FASTAs from UCSC, NCBI, a URL, a Galaxy 
History, or a Directory on the server (copy or symlink) and then populates the 
all_fasta table.



> --
> 
> BLAST+ - http://toolshed.g2.bx.psu.edu/view/devteam/ncbi_blast_plus/
> This uses blastdb.loc (nucleotides), blastdb_p.loc (proteins) etc.
> A simple interface for adding lines to the *.loc files would be
> useful, although the oddities of BLAST database naming might
> need a little code on top of a plain file select browser (the database
> name if the file path temp without the *.nal, *.pal, etc extension).
> 
> There is potential for offering to automatically create databases
> from this all_fasta data table you mention below?


The BWA index data manager 
(http://testtoolshed.g2.bx.psu.edu/view/blankenberg/data_manager_bwa_index_builder)
 uses the genomes available under all_fasta for building the mapping indexes.


>> The documentation for Data Managers is currently limited to the
>> tutorial-style doc here: 
>> http://wiki.galaxyproject.org/Admin/Tools/DataManagers/HowTo/Define;
>> a more formal / config syntax type of page will also be made available,
>> although the tutorial is a pretty inclusive description of the steps needed
>> to define a Data Manager.
> 
> Could I suggest you add that information (paraphrase what you just
> said in this email) to the main page:
> 
> http://wiki.galaxyproject.org/Admin/Tools/DataManagers
> 
> I think that would help.

Great suggestion, I'll add a bit of this and link to this discussion.


> 
>> 
>> 2.) We need a way to bootstrap/initialize a Galaxy installation with data
>> table/ .loc file entries ('built-in data') during installation for
>>        a.) a 'production' Galaxy instance - this would include local
>>             dev/testing/etc instances
>>        b.) automated testing framework - tests should run fast, but
>>             meaningfully test a tool, e.g., the horse mitochondrial
>>             genome could be a fine built-in genome for running
>>             automated tool tests, but not desired to be automatically
>>             installed into a production Galaxy instance
>> 
>> 
>> For 2.): bootstrapping data during an installation process is something
>> that still needs to be more completely spec'd out and implemented. ...
> 
> OK, so the Data Manager work does not yet cover bootstrapping
> (installing data as part of tool installation from the tool shed etc).
> 
> Regarding 2(b), Greg and I talked about this earlier in the thread and
> I filed Trello Card 1165 on a related issue:
> https://trello.com/c/P90b5Pa0/1165-functional-tests-need-separate-loc-files-to-the-live-production-loc-files-e-g-loc-test

This is a very important feature, especially for the automated testing 
framework. I'll add a comment to the card referencing this thread. If anyone 
wants to help working out the XML spec, I think that would be a great help -- 
IMHO, defining a well-thought out, solid, flexible XML description is  probably 
harder than the actual implementation.


> Thanks,
> 
> Peter


___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to