I would suggest using the bioperl modules for parsing GFF and FASTA files: Bio::Tools::GFF
and Bio::SeqIO This should save you a lot of pain. -Jason On Sat, Oct 3, 2009 at 5:59 AM, Mike Marchywka <[email protected]>wrote: > > <[email protected]> > Content-Type: text/plain; charset="iso-8859-1" > Content-Transfer-Encoding: quoted-printable > MIME-Version: 1.0 > > > > > > > > > > > ---------------------------------------- > > Date: Sat=2C 3 Oct 2009 12:54:51 +0100 > > From: [email protected] > > To: [email protected] > > Subject: Re: [BiO BB] gff to sequence > > > > You can do this easily in Perl... Here is some 'pseudo code' to > > (roughly) do it... > > > > > > ## Get a hash of sequences=2C keys =3D IDs=2C values =3D sequence > strings= > =3B > > my %sequences=3B > > ... > > > > # open the GFF file ... > > > > while(my $gff =3D ){ > > my @gffcols =3D split(/\t/=2C $gff)=3B > > > > print substr($sequence{$gffcols[0]}=2C $gffcols[3]=2C $gffcols[4] - > > $gffcols[3])=2C "\n"=3B > > ... > > } > > > > > > Or something roughly similar to the above =3B-) > > > > Dan. > > > > > > 2009/10/3 Kie Kyon Huang : > >> Hi=2C > >> > >> Is there a way to quickly extract out the coordinates from a gff file > >> and the corresponding sequence from a fasta file? > >> > > I guess it depends what you mean by quick- quick to write you could use awk > but then it depends what additional things you want to do with results.=20 > I ended up writing a C++ fasta utility program since PERL can slow down > som= > etimes but I ended up grabbing a couple of regex libraries to let me=20 > grep names etc.=20 > > > > > =0A= > _________________________________________________________________=0A= > Hotmail: Free=2C trusted and rich email service.=0A= > http://clk.atdmt.com/GBL/go/171222984/direct/01/= > > _______________________________________________ > BBB mailing list > [email protected] > http://www.bioinformatics.org/mailman/listinfo/bbb > _______________________________________________ BBB mailing list [email protected] http://www.bioinformatics.org/mailman/listinfo/bbb
