Pedro Antonio Reche wrote: > > Hi, I am interested in parsing the file at the bottom of this e-mail in > order to extract the string between "" following /product=, > /protein_id=, /db_xref= and /translation=, and that for each of the > segment separated by the string "CDS". The ouptput for the example > bellow should look like this: > > >V001|AAM13451.1|GI:20152990 > MESLKYFYSLSLSLFNGLTKILNLFLMESLKYFYSLSLSLFNGL > TKILNLFLMVSIKRSIFLTL > >V002|AAA60951.1|GI:333518 > KQIVLACICLAAVAIPTSLQQSFSSSSSCTEEENKHHMGIDVI > IKVTKQDQTPTNDKICQSVTEVTESEDESEEVVKGDPTTYYTVVGGGLTMDFGFTKCP > KISSISEYSDGNTVNARLSSVSPGQGKDSPAITREEALSMIKDCEMSINIKCSEEEKD > SNIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEISVRIGDMCKESSELEVKDG > FKYVDGSASEDAADDTSLINSAKLIACV > > So far I have use the code below which actually work. However, I am not > please with it, as it generates an empty element in the hash from the > header of the file and becasue that there might be a better way to do > this. Thereby, I will be very pleased for any input or alternative way > to improve the code. > Regards, > pedro > #!/usr/sbin/perl -w > $/ = "\n CDS"; > while(<>){ > $_ =~ /product=\"(.+)\"/; > $gname = $1; > $gname =~ s/\s+//g; > push @ID, $gname; > $_ =~ /protein_id="([\w\.]+)\"/; > $ref = $1; > $_=~ /db_xref=\"GI:(\w+)\"/; > $gid = $1; > $_ =~ /translation=\"([A-Z\s]+)/; > $seq = $1; > $seq =~ s/\s+//g; > $hash{$gname} = ["$ref", "$gid", "$seq"]; > } > open(F, ">test"); > foreach $key (@ID){ > print F ">gi|$hash{$key}[1]|$hash{$key}[0] > $key\n$hash{$key}[2]\n"; > } > close(F); > > [snip]
You probably want something like this: #!/usr/sbin/perl -w use strict; $/ = "\n CDS"; open F, '>test' or die "Cannot open 'test' $!"; while ( <> ) { my ($gname) = /product="([^"]+)"/; $gname =~ s/\s+//g; my ($ref) = /protein_id="([\w.]+)"/; my ($gid) = /db_xref="(GI:\w+)"/; my ($seq) = /translation="([A-Z\s]+)"/; $seq =~ s/\s+//g; print F "$gname|$ref|$gid\n$seq\n"; } close F; John -- use Perl; program fulfillment -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]