Greg,

It would be great if there were a way to expand upon the core datatypes using 
the ToolShed.

Would it be possible to have a separate datatype repository within the ToolShed?

Datatype
  name=""
  description=""
  datatype_dependencies=[]
  definition=<python code>
The tool config could be expanded to have requirement for datatypes.
   <requirement type="datatype">ssmap</requirement>




Table datatype
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+---------------------------------------------------
 id          | integer                     | not null default 
nextval('datatype_id_seq'::regclass)
 name        | character varying(255)      |
 version     | character varying(40)       |
 description | text                        |
 definition  | text                        |
UNIQUE (name)

Table datatype_datatype_association
   Column    |            Type             |                     Modifiers
-------------+-----------------------------+---------------------------------------------------
 id          | integer                     | not null default 
nextval('datatype_id_seq'::regclass)
 datatype_id | integer                     |
 requires_id | integer                     |
FOREIGN KEY (datatype_id) REFERENCES datatype(id)
FOREIGN KEY (requires_id) REFERENCES datatype(id)


Then for my mothur metagenomics tools I could define:

name="ssmap"   description="Secondary Structure Map"  version="1.0"  
datatype_dependencies=[tabular]
definition=
from galaxy.datatypes.tabular import Tabular
class SecondaryStructureMap(Tabular):
    file_ext = 'ssmap'
    def __init__(self, **kwd):
        """Initialize secondary structure map datatype"""
        Tabular.__init__( self, **kwd )
        self.column_names = ['Map']

    def sniff( self, filename ):
        """
        Determines whether the file is a secondary structure map format
        A single column with an integer value which indicates the row that this 
row maps to.
        check you make sure is structMap[10] = 380 then structMap[380] = 10.
        """
...




Then the align.check.xml tool_config could require the 'ssmap' datatype:

<tool id="mothur_align_check" name="Align.check" version="1.19.0">
 <description>Calculate the number of potentially misaligned bases</description>
 <requirements>
   <requirement type="binary">mothur</requirement>
   <requirement type="datatype">ssmap</requirement>
  </requirements>









John,

I've been following this message thread, and it seems it's gone in a direction 
that differs from your initial question about the possibility for Galaxy to 
handle automatic editing of the datatypes_conf.xml file when certain Galaxy 
tool shed tools are automatically installed.  There are some complexities to 
consider in attempting this.  One of the issues to consider is that the work 
for adding support for a new datatype to Galaxy lies outside of the intended 
function of the tool shed.  If new support is added to the Galaxy code base, an 
entry for that new datatype should be manually added to the table at the same 
time.  There may be benefits to enabling automatic changes to datatype entries 
that already exist in the file (e.g., adding a new converter for an existing 
datatype entry), but perhaps adding a completely new datatype to the file may 
not be appropriate.  I'll continue to think about this - send additional 
thought and feedback, as doing so is always helpful

Thanks!

Greg


On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:

One of the things we’re facing is the sheer size of a whole human genome at 30x 
coverage. An effective way to deal with that is by compressing the FASTQ files. 
That works for BWA and our ELAND, which can directly read a compressed FASTQ, 
but other tools crash when reading compressed FASTQ filesfiles. One way to 
address that would be to introduce a new type, for example “CompressedFastQ”, 
with a conversion to FASTQ defined. BWA could take both types as input. This 
would allow the best of both worlds – efficient storage and use by all existing 
tools.

Another example would be adding the CASAVA tools to Galaxy. Some of the 
statistics generation tools use custom file formats. To be able to make the use 
of those tools optional and configurable, they should be separate from the 
aligner, but that would require that Galaxy be made aware of the custom file 
formats – we’d have to add a datatype.

John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jduddy at illumina.com

From: Greg Von Kuster [mailto:greg at bx.psu.edu]
Sent: Wednesday, October 05, 2011 6:25 PM
To: Duddy, John
Cc: galaxy-dev at lists.bx.psu.edu
Subject: Re: [galaxy-dev] Tool shed and datatypes

Hello John,

The Galaxy tool shed currently is not enabled to automatically edit the 
datatypes_conf.xml file, although I could add this feature if the need exists.  
Can you elaborate on what you are looking to do regarding this?

Thanks!


On Oct 5, 2011, at 1:52 PM, Duddy, John wrote:


Can we introduce new file types via tools in the tool shed? It seems Galaxy can 
load them if they are in the datatypes configuration file. Does tool 
installation automate the editing of that file?


John Duddy
Sr. Staff Software Engineer
Illumina, Inc.
9885 Towne Centre Drive
San Diego, CA 92121
Tel: 858-736-3584
E-mail: jduddy at illumina.com

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Greg Von Kuster
Galaxy Development Team
greg at bx.psu.edu

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/

Reply via email to