Hi Nataliya - still catching up on my email backlog after being sick and on holiday (both at the same time :( ).
On Mon Nov 26 10:03:11 2012, Nataliya Sherstneva wrote: > I've finished my code and pushed it into central repository (there is > a new branch, based on recent Release_2_8). > I've changed parsing() as we agreed, developed print() in the > StockholmFile class and slightly changed code in the > AppletFormatAdapter class. great! > > I haven't pushed my test code yet. If you'd like I can do it as well. It's probably worth doing that - I'm trying to get into the habit of making test cases, and found that it is useful for others to see how to use the code you develop. Don't worry if it they are untidy ! you (or someone else) can always improve them later :) > And about SequenceI.getDBRef(). I've seen in file in Stockholm Format > lines like this: > #=GS O31698/18-71 AC O31698 > #=GS O83071/192-246 AC O83071 > #=GS O83071/259-312 AC O83071 > > but the StockholmFile parsing method doesn't save it. Do you mean another > format can save this DBRef > > and I should check and print its? ah. yes. This is another issue. One of the problems with stockholm format is that for some stockholm files - additional information is needed in order to identify which database a particular 'AC' annotation refers to. This is exactly the problem with Rfam and Pfam - see this bug: http://issues.jalview.org/browse/JAL-851 I took another look around at the bunch of tools/databases where stockholm is supported and I think there isn't going to be a perfect solution. For parsing: 1. if there are records like: #=GS DR Uniprot; O31698 - then we can create a full DBRef object immediately. 2. If there are only AC records, then there needs to be an 'assume database name' variable - say we call it defaultDB. This could be set by the code that constructs jalview.io.StockholmFile using a get/set method, or be set automatically if it looks like we are accessing an Xfam database.. eg: if the file contains an alignment database reference like: #=GF AC PF...... then we can assume it's an alignment file originally from Pfam (all pfam alignments have IDs like PF012345...), and the database accessions will most likely be Uniprot database accessions. Alternately, if it has: #=GF AC RF...... Then it's most likely to be an Rfam alignment. The default database here is more tricky - but in many cases, it will be an EMBL accession. Jim. _______________________________________________ Jalview-dev mailing list [email protected] http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-dev
