Hi all.

David Roldán Martínez, who's currently working on a GenBank parser in 
Jalview, noticed some problems with the way Jalview represents Sequence 
Features (see below) - this prompted me to write down some thoughts on 
what we need for Jalview 3.
> This object [jalview.datamodel.SequenceFeature] has a begin and an end 
> positions refering to the positional nature of the feature. However, 
> what happens when this location is not a range but a join or a 
> complement of range, as it can happen in GenBank (not sure what 
> happens in other file formats)?. Or is there any means to translate 
> these kinds of ranges to begin..end one?
This deficiency is mostly covered by 
http://issues.jalview.org/browse/JAL-1191 - support for hierarchies of 
features and feature groups, but there is a complication since Jalview 
has another way of representing coding region annotation.

For a series of coding regions in ENA records, Jalview creates 
jalview.datamodel.SequenceFeature objects to highlight the regions, and 
also constructs a jalview.datamodel.Mapping object which associates 
coding positions on the ENA dataset sequence with positions on the 
derived sequence (which is usually protein, but could be a transcript 
sequence). These Mapping objects are attached as DBRefEntry 
cross-reference objects on each sequence, and processed by the routines 
in jalview.analysis.CrossRef.

> Additionally, there are some fields (i.e. otherDetails) that are 
> publicly "available" when it is a common practice to hide them and 
> access through getter/setter.
This is something that needs to be looked at carefully. We originally 
avoided getter/setters for some fields in order to avoid overheads when 
accessing data from some fields. However, for OSGI, we will need to 
create an interface for sequence feature objects, so getter/setters will 
be unavoidable.

TODO
====

As far as I see it, there are two issues to focus on.

1. How to represent complex features

Most systems employ some kind of linking model (e.g. 
'parent'/'children'/'edge' type links).

This fits well with GFF3: http://gmod.org/wiki/GFF3
To summarise:
* A 'Parent' field links a feature to a parent feature that has a 
matching 'ID' field.
* Several different features sharing the same ID field are 'siblings'.

The question is, how should these be managed in Jalview. Currently, 
Jalview hold features as simple lists, which aren't so efficient (see 
nested containment lists - 
http://www.jalview.org/pipermail/jalview-dev/2012-June/000220.html ). 
Complex relationships can be represented via the ID and parent fields 
stored in 'otherDetails' (which are parsed from GFF files) but nothing 
operates on them.

Ultimately, a family of SequenceFeatureI and FeatureCollectionI type 
interfaces are needed to interact with and manipulate simple and complex 
features. These need to be scalable, since the associated sequence may 
be a chromosome or genomic contig, and compatible with database/file 
backed array storage.

2. What needs to change in the GUI/rendering system to visualise and 
interact with complex features.

This is a more complex issue. There are a bunch of issues related to 
bulk editing of individual features, searching features, etc. Ideally, 
hierarchical features should be handled in a similar way. The first 
question to ask here is: what are the must have feature display and 
editing capabilities ?

OK - that's my braindump for the moment. I don't plan on introducing any 
changes to the datamodel for 2.8.1 or 2.8.2, but we do need changes for 
v3 that will support import/export of data from ENSEMBL, and GFF3. Any 
thoughts ?

Jim

_______________________________________________
Jalview-dev mailing list
[email protected]
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev

Reply via email to