I have found one public program that does that very thing. It's Sequin at http://www.ncbi.nlm.nih.gov/Sequin/. It has quite a nice feature that checks the differences between old and new sequence, lets you scroll thorugh them and then updates. The input is fasta format and a tab-separated textfile with annotations. Sequin then creates a genbank entry that can be updated with new sequence in fasta format. I don't know how they do it or their programming language, but if you are intrested maybe they will let you look at their algortihms.
Regards, Marcus On Tue, 2003-09-02 at 10:08, Kim Rutherford wrote: > On Fri, 29 Aug 2003 12:09:46 -0700, Philip Hugenholtz wrote: > > > Hi All > > We've been annotating partial genome contigs in artemis and now have > > those contigs assembled into one genome scaffold. > > > Is there a simple way to change all of the ORF CDS coordinates in the > > contigs at once to reflect the restructuring of the contigs into a > > larger scaffold? (in artemis directly, or maybe a perlscript exists??) > > Hi. > > Artemis isn't able to re-map feature coordinates. This is quite a hard > problem to solve because you need to deal with all the possible ways in > which a contig can change between assembles. Potentially any > insertion, deletion, split or join is possible. > > If you already know how the coordinates on the old contigs map to the > new contigs, you should be able to update the coordinates with a bit of > Perl. If not, you'll need to use a sequence comparison program > (eg. Cross_match or BLASTN) to work out the mapping. The Ensembl > project (http://www.ensembl.org/) does this for eukaryotic genomes > using Cross_match (I believe) and the code is free. The down side is > that Ensembl is a large package and so it may be overkill for your > situation > > Kim.
