> >> Is there a way to quickly extract out the coordinates from a gff file > >> and the corresponding sequence from a fasta file? > >> > This seems of such general use that it begs a small utility which will take a (possibly indexed) fasta file, a gff and output the sequences you want. What would people want from such a programme? Is GTF (http://mblab.wustl.edu/GTF2.html) more useful or GFF? Would different elements from the same group (gene/transcript) be joined together in order? Would one want filtering on the "features" column so one could retrieve all splice sites or codon exons? What would be the output? Another fasta file? How would each "group" of Sequences (e.g. transcript) be labelled? By a user supplied regular expression?
> I guess it depends what you mean by quick- quick to write you could use awk > but then it depends what additional things you want to do with results.=20 > I ended up writing a C++ fasta utility program since PERL can slow down som= > etimes but I ended up grabbing a couple of regex libraries to let me=20 > grep names etc.=20 I hoped you used boost:regex which will be in the next c++ standard (http://www.boost.org/doc/libs/1_40_0/libs/regex/doc/html/index.html) and is as easy to use and powerful as perl/python regular expressions (though c rules on escaping backslashes are a pain). Leo Leo Goodstadt _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
