Hi Steffen - thanks for your mail! Steffen Schmidt wrote: > I your manual about annotation files you describe: > http://www.jalview.org/help/oldhelp/html/features/annotationsFormat.html > > ... > You can associate an annotation with a sequence by preceding its > definition with the line: > > SEQUENCE_REFseq_name[startIndex] > ... > > I wonder what the exact format of seq_name is: > > Image I get a fasta file like this: >> db|183474|my_pet_protein > > Do I have to put in the full id or are other variations ok? > > SEQUENCE_REFdb|183474|my_pet_protein1 > SEQUENCE_REF1834741 > SEQUENCE_REFmy_pet_protein1 > > Background: Since most often accession numbers don’t tell you the > species name, I would like to add the species info to the sequence > name to quickly spot the organism. e.g. > my_pet_protein|Escherichia_coli. But then, I would need to change the > annotation file seq_name if I can’t use a shorthand… Jalview's annotation file format works on exact string matches to associate tracks with a sequence. We made that decision because the format was designed to be a way for other programs to generate data for import in to Jalview.
It is reasonably straightforward to allow substring based matching like you suggest - Jalview does that for Newick tree import already, so the function is available - so I can create a patch right away, if you like. I've created a new feature request for this at http://issues.jalview.org/browse/JAL-1427 However, there might be some backwards compatibility problems in the case where an alignment includes different sequences where one sequence's ID is wholly contained in another, so I don't think I can make substring matching the default behaviour when parsing the SEQUENCE_REF tag in annotation files. Any thoughts ? Jim. _______________________________________________ Jalview-discuss mailing list [email protected] http://www.compbio.dundee.ac.uk/mailman/listinfo/jalview-discuss
