Hi John,

The general question, I think, is whether reproducibility is important.  If it 
is, then we should not introduce new behavior that adversely impacts it.  There 
are undoubtedly scenarios where reproducibility is not currently absolutely 
guaranteed, but those area of weakness should be corrected (as time and 
resources allow) when they are discovered if reproducibility is one of the 
desired features.

Please see my inline comments too.

On Jul 18, 2014, at 11:59 AM, John Chilton <jmchil...@gmail.com> wrote:

> Does the current implementation really handle datatypes in reproducible 
> manner - if I have a repo which in revision 1 defines foo1 as a text subtype, 
> foo2 as a tabular type and foo3 as a new type in foo.py and then in revision 
> 2 foo1 is defined as a binary subtype , foo2 and foo3 disappear and foo4 is a 
> new type in foo.py (which no longer defines foo3) how could you possibly 
> resolve that in a "reproducible" manner.

So you have:

repo_a revision 1:
foo1 datatype as text subtype
foo2 datatype as tabular
foo3 new datatype in foo.py

repo_a revision 2:
foo1 datatype as binary subtype
foo4 new type in foo.py

I would say that this is an example of a "bad practice" on the part of the 
repository owner, but, of course, this scenario can certainly occur.  In this 
case, the current implementation creates 2 separate installable revisions of 
repo_a which are loaded into the datatype's registry in a specific order.  If 
repo_a revision 1 was installed first, then it will always be loaded first, and 
the foo1 and foo4 datatypes contained in repo_a revision 2 will not be loaded 
because they are currently considered conflicting datatypes.  So currently, 
reproducibility is ensured, but the versions of foo1 and foo4 in revision 2 
cannot be used.  This may not be ideal, but in order to allow both versions to 
be used, more than the datatype extensions will be needed in order to 
defferentiate datatypes (i.e., some named-spaced identifier similar to the Tool 
Shed's guid for tools).


> Some of your tools are going to expect foo1 to be one thing - others 
> something else. You are only going to place 1 copy of foo.py on the 
> PYTHONPATH right (or at least python will only load one)? Is it going to 
> define foo3 or foo4? In addition to lacking reproducibility within one 
> instance - if you are somehow trying to preserve all the datatypes a 
> repository has ever defined I feel like after a long stream of such updates - 
> the behavior of the datatypes is going to vary from one installation to 
> another that installed different repository versions. Hence - reproducibility 
> across instances is subtly broken as well? 
> 
> None of this is a solution of course - this problem strikes me as being very 
> difficult. 
> 
> That said - I think correctness and reproduciblity across instances is more 
> important than reproducibility within the same instance over time - so for 
> that reason I think there only being one installable revision of datatypes 
> might be a big step forward relative to the status quo. Intuitively - if we 
> are not namespacing/versioning datatypes - there should only be one 
> definition and it should be the most recently installed one right?
> 
> It would also resolve this https://trello.com/c/oTq2Kewd problem - where 
> unsniffable binary datatypes are treated as sniffiable if there was ever an 
> installed version that was some sniff-able datatype.
> 
> -John
> 
> On Jul 17, 2014 12:35 PM, "Greg Von Kuster" <g...@bx.psu.edu> wrote:
> This would be easy to implement, but could adversely affect reproducibility.  
> If a repository containing datatypes always had only a single installable 
> revision (i.e., the chagelog tip), then any datatypes defined in an early 
> changeset revision that are removed in a later changeset revision would no 
> longer be available.
> 
> Greg
> 
> On Jul 17, 2014, at 1:30 PM, Peter Cock <p.j.a.c...@googlemail.com> wrote:
> 
> > On Thu, Jul 17, 2014 at 6:10 PM, Björn Grüning
> > <bjoern.gruen...@gmail.com> wrote:
> >>
> >> ... but the problem will stay the same ... one [datatype definition] 
> >> repository
> >> can have multiple versions ...
> >>
> >
> > I like your idea that like tool dependency definitions, this should be a 
> > special
> > repository type on the ToolShed:
> >
> > Earlier, Björn Grüning <bjoern.gruen...@gmail.com> wrote:
> >>
> >> Imho datatypes should be handled like "Tool dependency definitions".
> >> There should be only one "installable revsion".
> >>
> >
> > This is something Greg will have to comment on - there may be
> > ramifications I'm not seeing.
> >
> > Peter
> >
> > ___________________________________________________________
> > Please keep all replies on the list by using "reply all"
> > in your mail client.  To manage your subscriptions to this
> > and other Galaxy lists, please use the interface at:
> >  http://lists.bx.psu.edu/
> >
> > To search Galaxy mailing lists use the unified search at:
> >  http://galaxyproject.org/search/mailinglists/
> >
> 
> ___________________________________________________________
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  http://lists.bx.psu.edu/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___________________________________________________________
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Reply via email to