Re: [DAS] Adjacent feature extension

Andy Jenkinson Mon, 07 Mar 2011 04:44:13 -0800

On 7 Mar 2011, at 11:51, Jonathan Warren wrote:

> On 7 Mar 2011, at 11:19, Andy Jenkinson wrote:
> 
>> On 7 Mar 2011, at 10:57, Jonathan Warren wrote:
>> 
>>> 
>>> My vote would ideally to change feature_by_id to return one feature and 
>>> have the adjacent_feature as returning one feature. This in my opinion 
>>> would mean these capabilities on servers do "exactly as they say on the 
>>> tin" and would be easier to implement for data providers and are thus more 
>>> likely to be implemented?
>>> If the feature_id capability as it stands is needed it could be changed to 
>>> something more akin to what it means like feature_id_region but I would bet 
>>> no one would bother to change it/use it?
>>> 
>>> However the reality is that we are too late to change the old 
>>> feature_by_id, but I don't think we need to make the same mistake twice by 
>>> repeating it for adjacent_features?
>> 
>> I disagree. I think the problems with feature-by-id are that a) the name of 
>> the capability implies singular, and b) the concept itself (i.e. getting a 
>> feature by its ID) is such a common operation that is otherwise missing in 
>> DAS. I don't think either of those apply to an "adjacent" capability unless 
>> you specifically choose to call it "adjacent-feature" as opposed to 
>> "adjacent-features". I honestly don't think a capability called 
>> "adjacent-features" with a query structure like 
>> "/das/features?adjacent=foo:1" implies singular, rather the opposite in 
>> fact. To me that query suggests "get me the features adjacent to foo:1". 
>> True that 2 features is plural which still leaves a "one feature either 
>> side" interpretation possible, but IMO certainly not implicit enough to stop 
>> anyone implementing it to actually read the specification/documentation. Add 
>> to that the fact that this is an entirely new behaviour that we have the 
>> chance to properly document and make it clear exactly what t!
 he server must do.
>> 
>> So IMO we have a clear choice.
> I still think it's simpler to implement it for one feature either side and 
> keep complexity in the client. Generally how many people stay wake after line 
> 10 when reading the spec? :) Lets see if there are more votes...


It probably is simpler to implement (well, to implement with maximum 
efficiency) and I am not advocating one over the other, but IMO the 
implementation considerations are a separate part of our choice and are 
orthogonal to whether it's confusing for those implementing it and consequently 
whether we see divergence from the spec like we do with feature-by-id. As 
Gustavo says, he'd implement feature-by-id as one feature because that's what 
he thinks it means, not because it's difficult. I'd posit that it'd be a one 
line change for any server maintainer to fix theirs to implement it correctly 
(i.e. use the feature's start/end to resubmit the query), it's just that it'd 
be more complicated to do it in a single step from the beginning.

We should be under no illusions though that people are going to be able to 
implement this easily without reading the documentation carefully, no matter 
which option is chosen. In particular, I can foresee servers not interpreting 
the "type" filter appropriately, being likely to process the adjacent query 
then apply the type filter, which would be wrong. I have a feeling most sources 
implement the type filter as a passive "post filter" rather than an active one. 
I can tell you right now that it is going to be really quite difficult for me 
to implement "adjacent" correctly for the ASTD gene/transcript/exon sources, 
and I suspect the same will be true for retrofitting lots of other sources.

>> 
>> As to feature-by-id, I know changing behaviour is potentially a very 
>> disruptive change, but I think we can potentially do this purely because 
>> servers don't tend to implement it correctly anyway. Clients can happily 
>> filter out any additional features returned by old servers, and if any 
>> clients are reliant on the server including all overlapping features then as 
>> far as I am concerned they are either a) targeting specific servers rather 
>> than DAS-wide and thus unaffected, or b) already broken :)
> So you agree feature-by_id should be changed if we have the stomach for it? - 
> good and Gustavo too. Well done Andy - You have just agreed to write Spec 1.7 
> or 3??? ;) Your argument above can be used for leaving the spec as it is then 
> as well - but ideally I agree and guess we can call it spec 1.61 assuming 
> other people agree.

I already have a small list of changes for DAS 1.7 or whatever and think it's 
fine for that context. In any case, let's keep these two issues separate as 
Thomas says.

>> 
>> I have to admit that the feature-by-id capability is one of the (many) 
>> things I loathe having to explain and would love to change it. Doing so 
>> would be consistent with what we were trying to do with 1.6 (i.e. 
>> rationalise existing use of the spec) but I chickened out really.
>> 
>> Cheers,
>> Andy
> 
> Jonathan Warren
> Senior Developer and DAS coordinator
> blog: http://biodasman.wordpress.com/
> [email protected]
> Ext: 2314
> Telephone: 01223 492314
> 
> 
> 
> 
> 
> 
> 
> 
> 
> -- 
> The Wellcome Trust Sanger Institute is operated by Genome ResearchLimited, a 
> charity registered in England with number 1021457 and acompany registered in 
> England with number 2742969, whose registeredoffice is 215 Euston Road, 
> London, NW1 2BE.


_______________________________________________
DAS mailing list
[email protected]
http://lists.open-bio.org/mailman/listinfo/das

Re: [DAS] Adjacent feature extension

Reply via email to