Before we go too much further down this path with dataytpes, I'm wondering if some of us should put together a spec of some kind that allows us to all agree on the direction. For example, I'm wondering if datatyps should be versioned and have a name-spaced identifier much like the Tool Shed's guid identifier for tools. I haven't thought too much about whether this would pose backward compatibility issues or not. Discussion is welcomed on this.
Greg Von Kuster On Jul 22, 2014, at 7:19 PM, Greg Von Kuster <g...@bx.psu.edu> wrote: > Hi Björn, > > > On Jul 22, 2014, at 6:01 PM, Björn Grüning <bjoern.gruen...@gmail.com> wrote: > >> Hi Greg, >> >> thanks for the clarification. Please see my comments below. >> >>> On Jul 20, 2014, at 3:22 PM, Peter Cock <p.j.a.c...@googlemail.com> wrote: >>> >>>> On Sun, Jul 20, 2014 at 6:23 PM, Björn Grüning <bjo...@gruenings.eu> wrote: >>>>> Hi, >>>>> >>>>> single datatype definitions only work if you haven’t defined any >>>>> converters. >>>>> Let's assume I have a datatype X and want to ship a X -> Y converter (Y >>>>> -> X >>>>> is also possible), we will end up with a dependency loop, or? The X >>>>> repository will depend on the Y repository, but Y is depending on X, >>>>> because >>>>> we want to include a Y -> X converter. >>>>> >>>>> Any idea how to solve that? >>> >>> I don't see a problem here, so I'm hoping I'm correctly understanding the >>> issue. >>> >>> If we have: >>> >>> repo_x contains the single datatype X >>> repo_y contains the single datatype Y >>> repo_x_to_y_converter contains a tool that converts datatype X to datatype >>> Y (this repository also defines 2 dependency relationships, one to repo_x >>> and another to repo_y) >>> repo_y_to_x_cenverter contains a tool that converts datatype Y to datatype >>> X (this repository also defines 2 dependency relationships, one to repo_x >>> and another to repo_y) >>> >>> Now if we want to install both the repo_x_to_y_converter and the >>> repo_y_to_x_cenverter automatically whenever either one is installed, we >>> have 2 options: >>> >>> 1) define a 3rd dependency relationshiop for repo_x_to_y_converter to >>> depend on repo_y_to_x_cenverter and, similarly a 3rd dependency >>> relationshiop for repo_y_to_x_cenverter on repo_x_to_y_converter. This >>> does indeed >>> create a circular repository dependency relationship, but the Tool Shed >>> installation process will handle it correctly, installing all 4 >>> repositories with proper dependency relationships created between them >> >> Does that mean, circular dependencies will be no problem at all? > > > Yes, the Tool Shed handles circular dependency definitions of any variety, so > circular dependency definitions pose no problem. > > >> Do you consider including the converters into the datatypes as >> best-practise? (These converters are implicit-galaxy-converters). >> I would have only two repositories with circular dependencies. > > > Yes, however, there are some current limitations in the framework detailed on > this Trello card: > https://trello.com/c/Ho3ra4b9/206-add-support-for-datatype-converters-and-display-applications > > Tag sets like the following that are defined in a datatypes_conf.xml file > contained in a repository should be correctly loaded into the in-memory > datatypes registry when the repository is instlled into Galaxy. However, it > has been quite a while since I've worked in this area, so let me know if you > encounter any issues. The current best practice is probaly that the > converters themselved would each individually be in separate repositories > (just like all Galaxy tools), but this can certainly be discussed if > appropriate. Community thoughts are welcome here! > > <datatype extension="bam" type="galaxy.datatypes.binary:Bam" > mimetype="application/octet-stream" display_in_upload="true"> > <converter file="bam_to_bai.xml" target_datatype="bai"/> > <converter file="bam_to_bigwig_converter.xml" target_datatype="bigwig"/> > <display file="ucsc/bam.xml" /> > <display file="ensembl/ensembl_bam.xml" /> > <display file="igv/bam.xml" /> > <display file="igb/bam.xml" /> > </datatype> > > >> >>> 2) Instead of creating a circlular dependency relationship between >>> repo_x_to_y_converter and repo_y_to_x_cenverter, create an additional >>> suite_definition_x_y repository (of type "repository_suite_definition" that >>> defines relationships to repo_x_to_y_converter and repo_y_to_x_cenverter, >>> ultimately installing all 4 repositories, but without defining any circular >>> dependency relationships. >> >> repo_x_to_y_converter and repo_y_to_x_converter would have dependencies on >> datatype X and Y, so I do not see the need for a suite_definition ... or it >> is some collection like the emboss_datatypes … > > I agree. > > >> >> My scenario is more that the converters are not tools, they are implicit >> converters and should _not_ be displayed in the tool panel. >> As far as I know they need to be defined inside the datatypes_conf.xml file. > > > Yes, they must be defined inside the datatypes_conf.xml file. However, > converters are just special Galaxy Tools (they are "special" in the same way > that Data Manager tools are special). They are loaded into the in-memory > Galaxy tools registry, but not displayed in the tool panel. > > >> >> I think if circular dependencies are not a problem I will try to implement a >> proof of concept. EMBOSS is now splitted: > > Sounds goos - circular dependencies should pose no problems. > > >> >> https://github.com/bgruening/galaxytools/tree/master/datatypes/emboss_datatypes >> >> Thanks Greg! >> Bjoern >> >>> Either of the above 2 scenarios will correctly install the 4 repositories. >>> >>> Let me know if I'm missing something here. >>> >>> Thanks! >>> >>> Greg >>> >>>> >>>> Excellent example! >>>> >>>>> How to handle versions of datatypes? Extra repositories for stockholm 1.0 >>>>> and 1.1? If so ... the associated python file (sniffing, splitting ...) >>>>> should be also versioned, or? What happend if I have two stockholm.py >>>>> files >>>>> in my system? >>>> >>>> Potentially you might need/want to define those as two different >>>> Galaxy datatypes? >>>> >>>>> @Peter, can we create a striped-down, python only biopython egg? All >>>>> parsers >>>>> should be included, Bio.SeqIO should be sufficient I think. >>>> >>>> Right now, yes in principle (and this is fine from the licence point of >>>> view), >>>> but in practise this is a fair chunk of work. However, we are looking at >>>> this - see https://github.com/biopython/biopython/issues/349 >>>> >>>> Peter >>>> >>>> ___________________________________________________________ >>>> Please keep all replies on the list by using "reply all" >>>> in your mail client. To manage your subscriptions to this >>>> and other Galaxy lists, please use the interface at: >>>> http://lists.bx.psu.edu/ >>>> >>>> To search Galaxy mailing lists use the unified search at: >>>> http://galaxyproject.org/search/mailinglists/ >>>> >>> >>> >>> _______________________________________________ >>> galaxy-iuc mailing list >>> galaxy-...@lists.bx.psu.edu >>> http://lists.bx.psu.edu/listinfo/galaxy-iuc >>> > > > ___________________________________________________________ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > http://lists.bx.psu.edu/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ > ___________________________________________________________ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/