Thanks Jim,

This is on my development plan, but it may take a few days for me to get 
heavily into it.  I'll get back to you hopefully some time next week.

Greg

On Oct 21, 2011, at 1:13 PM, Jim Johnson wrote:

> Greg,
> 
> I put the gmap tool suite in the galaxy Tool Shed,  let me know if there is 
> more I should do.  
>   
> It has 5 galaxy tools:
>     GMAP   -  Genomic Mapping and Alignment Program for mRNA and EST 
> sequences 
>     GSNAP    - Genomic Short-read Nucleotide Alignment Program       
>     GMAP Build    -  a database genome index for GMAP and GSNAP     ( calls:  
> gmap_build, iit_store, snpindex, cmetindex, atoiindex ) 
>     GMAP SNP Index    - build index files for known SNPs                     
> (calls:  iit_store, snpindex) 
>     GMAP IIT    - Create a map store for known genes or SNPs              
> (calls:  iit_store) 
> 
> It uses these added datatypes:
> % grep -E '(^class | file_ext)' lib/galaxy/datatypes/gmap.py 
> class GmapDB( Text ):
>     file_ext = 'gmapdb'
> class GmapSnpIndex( Text ):
>     file_ext = 'gmapsnpindex'
> class IntervalIndexTree( Text ):
>     file_ext = 'iit'
> class SpliceSitesIntervalIndexTree( IntervalIndexTree ):
>     file_ext = 'splicesites.iit'
> class IntronsIntervalIndexTree( IntervalIndexTree ):
>     file_ext = 'introns.iit'
> class SNPsIntervalIndexTree( IntervalIndexTree ):
>     file_ext = 'snps.iit'
> class IntervalAnnotation( Text ):
>     file_ext = 'gmap_annotation'
> class SpliceSiteAnnotation(IntervalAnnotation):
>     file_ext = 'gmap_splicesites'
> class IntronAnnotation(IntervalAnnotation):
>     file_ext = 'gmap_introns'
> class SNPAnnotation(IntervalAnnotation):
>     file_ext = 'gmap_snps'
> 
> I added a requirement tag for the datatypes to the tool-configs:
> % grep 'requirement.*datatype' *.xml
> gmap_build.xml:      <requirement type="datatype">gmapdb</requirement>
> gmap_build.xml:      <requirement type="datatype">gmap_snps</requirement>
> gmap.xml:    <requirement type="datatype">gmapdb</requirement>
> gmap.xml:    <requirement type="datatype">gmap_annotation</requirement>
> gmap.xml:    <requirement type="datatype">gmap_splicesites</requirement>
> gmap.xml:    <requirement type="datatype">gmap_introns</requirement>
> gmap.xml:    <requirement type="datatype">gmap_snps</requirement>
> gsnap.xml:      <requirement type="datatype">gmapdb</requirement>
> gsnap.xml:      <requirement type="datatype">gmapsnpindex</requirement>
> gsnap.xml:      <requirement type="datatype">splicesites.iit</requirement>
> gsnap.xml:      <requirement type="datatype">introns.iit</requirement>
> iit_store.xml:      <requirement type="datatype">gmap_annotation</requirement>
> iit_store.xml:      <requirement type="datatype">gmap_snps</requirement>
> iit_store.xml:      <requirement type="datatype">iit</requirement>
> iit_store.xml:      <requirement type="datatype">splicesites.iit</requirement>
> iit_store.xml:      <requirement type="datatype">introns.iit</requirement>
> iit_store.xml:      <requirement type="datatype">snps.iit</requirement>
> snpindex.xml:      <requirement type="datatype">gmapsnpindex</requirement>
> snpindex.xml:      <requirement type="datatype">gmapdb</requirement>
> snpindex.xml:      <requirement type="datatype">gmap_snps</requirement>
> snpindex.xml:      <requirement type="datatype">snps.iit</requirement>
> 
> Thanks,
> 
> JJ
> 
> 
> On 10/18/11 10:18 AM, Greg Von Kuster wrote:
>> 
>> Jim,
>> 
>> Sounds great - this will be very helpful!
>> 
>> Greg
>> 
>> On Oct 18, 2011, at 11:03 AM, Jim Johnson wrote:
>> 
>>> Greg,
>>> 
>>> The mothur_toolsuite in the ToolShed  contains a file with added datatypes 
>>> for metagenomics (used by mothur and some by qiime):
>>> mothur_toolsuite/mothur/lib/galaxy/datatypes/metagenomics.py
>>> The README has info on how I incorporated mothur into our local galaxy 
>>> server.
>>> 
>>> I'm also working on GMAP/GSNAP  (  http://research-pub.gene.com/gmap/ )
>>> So far I've created a GmapDB class,  analogous to the ngsindex.BowtieIndex 
>>> class, but with more metadata.
>>> I'm also adding a IntervalIndexTree class for indexing maps of splice 
>>> junctions, introns, and SNPs.
>>> I'll send you this as soon as I've got it working.
>>> 
>>> Thanks,
>>> 
>>> JJ
>>> 
>>> 
>>> On 10/17/11 1:06 PM, Greg Von Kuster wrote:
>>>> We've digested this topic a bit here at Galaxy Central, and agree that at 
>>>> some point ( maybe soon for very basic functionality ) we need to provide 
>>>> support for new data types in tool shed repositories.  It would be very 
>>>> helpful ( and significantly speed up the development process ) if the 
>>>> community could provide at least 2 different tools that use data types not 
>>>> included in the Galaxy distribution ( sending me a tarball that includes 
>>>> all the tool dependencies, including the new data type class would be 
>>>> ideal ).  When I get them I'll add this new feature set to my development 
>>>> list.
>>>> 
>>>> Thanks everyone for all the input on this!
>>>> 
>>>> Greg Von Kuster
>>>> 
>>>> 
>>>> On Oct 7, 2011, at 2:05 PM, Jim Johnson wrote:
>>>> 
>>>>> Greg,
>>>>> 
>>>>> It would be great if there were a way to expand upon the core datatypes 
>>>>> using the ToolShed.
>>>>> 
>>>>> Would it be possible to have a separate datatype repository within the 
>>>>> ToolShed?
>>>>> 
>>>>> Datatype
>>>>>  name=""
>>>>>  description=""
>>>>>  datatype_dependencies=[]
>>>>>  definition=<python code>
>>>>> 
>>>>> The tool config could be expanded to have requirement for datatypes.
>>>>>   <requirement type="datatype">ssmap</requirement>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Table datatype
>>>>>   Column    |            Type             |                     Modifiers
>>>>> -------------+-----------------------------+---------------------------------------------------
>>>>> id          | integer                     | not null default 
>>>>> nextval('datatype_id_seq'::regclass)
>>>>> name        | character varying(255)      |
>>>>> version     | character varying(40)       |
>>>>> description | text                        |
>>>>> definition  | text                        |
>>>>> UNIQUE (name)
>>>>> 
>>>>> Table datatype_datatype_association
>>>>>   Column    |            Type             |                     Modifiers
>>>>> -------------+-----------------------------+---------------------------------------------------
>>>>> id          | integer                     | not null default 
>>>>> nextval('datatype_id_seq'::regclass)
>>>>> datatype_id | integer                     |
>>>>> requires_id | integer                     |
>>>>> FOREIGN KEY (datatype_id) REFERENCES datatype(id)
>>>>> FOREIGN KEY (requires_id) REFERENCES datatype(id)
>>>>> 
>>>>> 
>>>>> Then for my mothur metagenomics tools I could define:
>>>>> 
>>>>> name="ssmap"   description="Secondary Structure Map"  version="1.0"  
>>>>> datatype_dependencies=[tabular]
>>>>> definition=
>>>>> from galaxy.datatypes.tabular import Tabular
>>>>> class SecondaryStructureMap(Tabular):
>>>>>    file_ext = 'ssmap'
>>>>>    def __init__(self, **kwd):
>>>>>        """Initialize secondary structure map datatype"""
>>>>>        Tabular.__init__( self, **kwd )
>>>>>        self.column_names = ['Map']
>>>>> 
>>>>>    def sniff( self, filename ):
>>>>>        """
>>>>>        Determines whether the file is a secondary structure map format
>>>>>        A single column with an integer value which indicates the row that 
>>>>> this row maps to.
>>>>>        check you make sure is structMap[10] = 380 then structMap[380] = 
>>>>> 10.
>>>>>        """
>>>>> ...
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> Then the align.check.xml tool_config could require the 'ssmap' datatype:
>>>>> 
>>>>> <tool id="mothur_align_check" name="Align.check" version="1.19.0">
>>>>> <description>Calculate the number of potentially misaligned 
>>>>> bases</description>
>>>>> <requirements>
>>>>>   <requirement type="binary">mothur</requirement>
>>>>>   <requirement type="datatype">ssmap</requirement>
>>>>>  </requirements>
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>> John,
>>>>>> 
>>>>>> I've been following this message thread, and it seems it's gone in a 
>>>>>> direction that differs from your initial question about the possibility 
>>>>>> for Galaxy to handle automatic editing of the datatypes_conf.xml file 
>>>>>> when certain Galaxy tool shed tools are automatically installed.  There 
>>>>>> are some complexities to consider in attempting this.  One of the issues 
>>>>>> to consider is that the work for adding support for a new datatype to 
>>>>>> Galaxy lies outside of the intended function of the tool shed.  If new 
>>>>>> support is added to the Galaxy code base, an entry for that new datatype 
>>>>>> should be manually added to the table at the same time.  There may be 
>>>>>> benefits to enabling automatic changes to datatype entries that already 
>>>>>> exist in the file (e.g., adding a new converter for an existing datatype 
>>>>>> entry), but perhaps adding a completely new datatype to the file may not 
>>>>>> be appropriate.  I'll continue to think about this - send additional 
>>>>>> thought and feedback, as doing so is always helpful
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Greg
>>>>>> 
>>>>>> 
>>>>>> On Oct 5, 2011, at 11:48 PM, Duddy, John wrote:
>>>>>> 
>>>>>>> One of the things we’re facing is the sheer size of a whole human 
>>>>>>> genome at 30x coverage. An effective way to deal with that is by 
>>>>>>> compressing the FASTQ files. That works for BWA and our ELAND, which 
>>>>>>> can directly read a compressed FASTQ, but other tools crash when 
>>>>>>> reading compressed FASTQ filesfiles. One way to address that would be 
>>>>>>> to introduce a new type, for example “CompressedFastQ”, with a 
>>>>>>> conversion to FASTQ defined. BWA could take both types as input. This 
>>>>>>> would allow the best of both worlds – efficient storage and use by all 
>>>>>>> existing tools.
>>>>>>> 
>>>>>>> Another example would be adding the CASAVA tools to Galaxy. Some of the 
>>>>>>> statistics generation tools use custom file formats. To be able to make 
>>>>>>> the use of those tools optional and configurable, they should be 
>>>>>>> separate from the aligner, but that would require that Galaxy be made 
>>>>>>> aware of the custom file formats – we’d have to add a datatype.
>>>>>>> 
>>>>>>> John Duddy
>>>>>>> Sr. Staff Software Engineer
>>>>>>> Illumina, Inc.
>>>>>>> 9885 Towne Centre Drive
>>>>>>> San Diego, CA 92121
>>>>>>> Tel: 858-736-3584
>>>>>>> E-mail: jduddy at illumina.com
>>>>>>> 
>>>>>>> From: Greg Von Kuster [mailto:greg at bx.psu.edu]
>>>>>>> Sent: Wednesday, October 05, 2011 6:25 PM
>>>>>>> To: Duddy, John
>>>>>>> Cc: galaxy-dev at lists.bx.psu.edu
>>>>>>> Subject: Re: [galaxy-dev] Tool shed and datatypes
>>>>>>> 
>>>>>>> Hello John,
>>>>>>> 
>>>>>>> The Galaxy tool shed currently is not enabled to automatically edit the 
>>>>>>> datatypes_conf.xml file, although I could add this feature if the need 
>>>>>>> exists.  Can you elaborate on what you are looking to do regarding this?
>>>>>>> 
>>>>>>> Thanks!
>>>>>>> 
>>>>>>> 
>>>>>>> On Oct 5, 2011, at 1:52 PM, Duddy, John wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> Can we introduce new file types via tools in the tool shed? It seems 
>>>>>>> Galaxy can load them if they are in the datatypes configuration file. 
>>>>>>> Does tool installation automate the editing of that file?
>>>>>>> 
>>>>>>> 
>>>>>>> John Duddy
>>>>>>> Sr. Staff Software Engineer
>>>>>>> Illumina, Inc.
>>>>>>> 9885 Towne Centre Drive
>>>>>>> San Diego, CA 92121
>>>>>>> Tel: 858-736-3584
>>>>>>> E-mail: jduddy at illumina.com
>>>>>>> 
>>>>>>> ___________________________________________________________
>>>>>>> Please keep all replies on the list by using "reply all"
>>>>>>> in your mail client.  To manage your subscriptions to this
>>>>>>> and other Galaxy lists, please use the interface at:
>>>>>>> 
>>>>>>> http://lists.bx.psu.edu/
>>>>>>> 
>>>>>>> Greg Von Kuster
>>>>>>> Galaxy Development Team
>>>>>>> greg at bx.psu.edu
>>>>>>> 
>>>>> ___________________________________________________________
>>>>> Please keep all replies on the list by using "reply all"
>>>>> in your mail client.  To manage your subscriptions to this
>>>>> and other Galaxy lists, please use the interface at:
>>>>> 
>>>>> http://lists.bx.psu.edu/
>>>>> 
>>>> Greg Von Kuster
>>>> Galaxy Development Team
>>>> g...@bx.psu.edu
>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>> ___________________________________________________________
>>> Please keep all replies on the list by using "reply all"
>>> in your mail client.  To manage your subscriptions to this
>>> and other Galaxy lists, please use the interface at:
>>> 
>>> http://lists.bx.psu.edu/
>>> 
>> Greg Von Kuster
>> Galaxy Development Team
>> g...@bx.psu.edu
>> 
>> 
>> 
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu



___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Reply via email to