Hi Chris,
Thanks for the feedback. I suspect you may be right about the
optimisation - lots of other pieces have to fall into place for it to
work. Having subfeatures indicate their containers is indeed how GFF3
works (and is also how existing DAS works), but in most cases it's a
fair bit more verbose.
The other aspect I forgot to mention was the DAS-DAS2 transition. The
parent/part syntax is borrowed directly from DAS2 as I'm keen to avoid
more divergence when there remains a possibility of uniting them. If we
don't keep both elements, this isn't so important though.
Speaking personally, I'm not too worried about a lack of obviousness of
the relationship for using parent/part as I believe it's reasonably
obvious from the XML, but then again I already know what to expect. So I
certainly value your perspective if you think it is significantly confusing?
Cheers,
Andy
Chris Mungall wrote:
I suggest you name relations such that the inverses and directionality
are obvious
part_of / has_part
parent_of / child_of
has_parent / has_child
But not
part / parent
The argument for specifying both seems like premature optimization. I
suggest you align what you're doing with GFF3 as far as possible and
have subfeatures indicate their containing features.
On Feb 18, 2009, at 8:18 AM, Andy Jenkinson wrote:
Hi all,
As you may know, soon a new revision of the DAS specification will be
published. One of the features to be added is improved support for
hierarchical features, and I'm looking for input regarding a detail of
how this will be done.
The plan is to replace the <GROUP> structure with something similar to
the DAS/2 approach: parent features have concise <PART> elements that
identify other (separate) child features. Child features have <PARENT>
elements to represent the reciprocal relationship. This means the
group data no longer needs to be duplicated when shared by several
features, and groups can themselves have start/endpoints:
<FEATURE id="A1">
<PART id="B1" />
<PART id="B2" />
... start, end, notes and other verbose content ...
</FEATURE>
<FEATURE id="B1">
<PARENT id="A1" />
... content ...
</FEATURE>
<FEATURE id="B2">
<PARENT id="A1" />
... content ...
</FEATURE>
Here, both contain references to each other representing the same
link. However, it would be possible to represent the relationship even
if only one feature links to the other:
<FEATURE id="A1">
<PART id="B1" />
...
</FEATURE>
<FEATURE id="B1">
...
</FEATURE>
Therefore the option exists to omit the <PARENT> element from the
specification entirely. Over the last couple of years we have seen DAS
sources become more and more dense, and browsers wishing to display
larger regions. As a result, there is significant pressure to minimise
the verbosity of the XML response (there are other changes to the
upcoming spec to help with this). Whilst DAS2's alternative content
negotiation feature sidesteps the issue, DAS does not yet have this
and in any case it is my belief that the fallback XML format should
still be fit for purpose.
The counter argument (i.e. the case for requiring both <PARENT> and
<PART> elements) is based around the rendering efficiency benefits of
streaming. If a client knows for sure that it has parsed all features
that are related to each other, it can render them while it waits for
the server to send the rest of the response. A client could
potentially use this to offer a significant usability boost - a user's
perception of the speed of an interface is greatly influenced by how
fast a display starts to render rather than the time it takes to
complete. But at the moment there are no DAS clients that use this (it
is not possible with the current spec, and some clients such as
Ensembl cannot due to the way the data is rendered). I am not sure to
what extent it would be used in future either, for example it could
not be used where post-processing of the entire set of features is
necessary (e.g. binning).
So my question is: should the specification require bi-directional
references (<PARENT> and <PART>), or uni-directional (<PART> only)?
Whichever approach is taken, replacing the <GROUP> structure will
significantly reduce verbosity for groups with large numbers of child
features, but do we want to reduce this further by removing <PARENT>
elements at the cost of the potential for "streaming">
Apologies for the long and technical post.
Andy
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das
_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das