Hi Nataliya - still catching up on my email backlog after being sick and 
on holiday (both at the same time :( ).

On Mon Nov 26 10:03:11 2012, Nataliya Sherstneva wrote:
> I've finished my code and pushed it into central repository (there is
> a new branch, based on recent Release_2_8).
> I've changed parsing() as we agreed, developed print() in the
> StockholmFile class and slightly changed code in the
> AppletFormatAdapter class.
great!
>
> I haven't pushed my test code yet. If you'd like I can do it as well.
It's probably worth doing that - I'm trying to get into the habit of 
making test cases, and found that it is useful for others to see how to 
use the code you develop. Don't worry if it they are untidy !  you (or 
someone else) can always improve them later :)

> And about SequenceI.getDBRef(). I've seen in file in Stockholm Format
> lines like this:
> #=GS O31698/18-71 AC O31698
> #=GS O83071/192-246 AC O83071
> #=GS O83071/259-312 AC O83071
>
> but the StockholmFile parsing method doesn't save it. Do you mean another 
> format can save this DBRef
>
> and I should check  and print its?

ah. yes. This is another issue. One of the problems with stockholm 
format is that for some stockholm files - additional information is 
needed in order to identify which database a particular 'AC' annotation 
refers to. This is exactly the problem with Rfam and Pfam - see this 
bug: http://issues.jalview.org/browse/JAL-851

I took another look around at the bunch of tools/databases where 
stockholm is supported and I think there isn't going to be a perfect 
solution.

For parsing:
1. if there are records like:
#=GS DR Uniprot; O31698 - then we can create a full DBRef object 
immediately.
2. If there are only AC records, then there needs to be an 'assume 
database name' variable - say we call it defaultDB. This could be set by 
the code that constructs jalview.io.StockholmFile using a get/set 
method, or be set automatically if it looks like we are accessing an 
Xfam database.. eg:

if the file contains an alignment database reference like:
#=GF AC PF......
then we can assume it's an alignment file originally from Pfam (all pfam 
alignments have IDs like PF012345...), and the database accessions will 
most likely be Uniprot database accessions.

Alternately, if it has:
#=GF AC RF......

Then it's most likely to be an Rfam alignment. The default database here 
is more tricky - but in many cases, it will be an EMBL accession.

Jim.

_______________________________________________
Jalview-dev mailing list
[email protected]
http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev

Reply via email to