Pedro Antonio Reche wrote:
> 
> Hi, I am interested in parsing the file at the bottom of this e-mail in
> order to extract the string between "" following  /product=,
> /protein_id=, /db_xref= and  /translation=, and that for each of the
> segment separated by the string "CDS". The ouptput for the example
> bellow should look like this:
> 
> >V001|AAM13451.1|GI:20152990
> MESLKYFYSLSLSLFNGLTKILNLFLMESLKYFYSLSLSLFNGL
> TKILNLFLMVSIKRSIFLTL
> >V002|AAA60951.1|GI:333518
> KQIVLACICLAAVAIPTSLQQSFSSSSSCTEEENKHHMGIDVI
> IKVTKQDQTPTNDKICQSVTEVTESEDESEEVVKGDPTTYYTVVGGGLTMDFGFTKCP
> KISSISEYSDGNTVNARLSSVSPGQGKDSPAITREEALSMIKDCEMSINIKCSEEEKD
> SNIKTHPVLGSNISHKKVSYEDIIGSTIVDTKCVKNLEISVRIGDMCKESSELEVKDG
> FKYVDGSASEDAADDTSLINSAKLIACV
> 
> So far I have use the code below which actually work. However, I am not
> please with it, as it generates an empty element in the hash from the
> header of the file and becasue that there might be a better way to do
> this. Thereby, I will be very pleased for any input or alternative way
> to improve the code.
> Regards,
> pedro
> #!/usr/sbin/perl -w
> $/ = "\n     CDS";
> while(<>){
>         $_ =~ /product=\"(.+)\"/;
>                 $gname = $1;
>                 $gname =~ s/\s+//g;
>                 push @ID, $gname;
>         $_ =~ /protein_id="([\w\.]+)\"/;
>                 $ref = $1;
>         $_=~ /db_xref=\"GI:(\w+)\"/;
>                 $gid = $1;
>         $_ =~ /translation=\"([A-Z\s]+)/;
>                 $seq = $1;
>                 $seq  =~ s/\s+//g;
>                $hash{$gname} = ["$ref", "$gid", "$seq"];
> }
> open(F, ">test");
> foreach $key (@ID){
>         print F ">gi|$hash{$key}[1]|$hash{$key}[0]
> $key\n$hash{$key}[2]\n";
> }
> close(F);
> 
> [snip]


You probably want something like this:

#!/usr/sbin/perl -w
use strict;

$/ = "\n     CDS";

open F, '>test' or die "Cannot open 'test' $!";

while ( <> ) {
    my ($gname) = /product="([^"]+)"/;
    $gname      =~ s/\s+//g;
    my ($ref)   = /protein_id="([\w.]+)"/;
    my ($gid)   = /db_xref="(GI:\w+)"/;
    my ($seq)   = /translation="([A-Z\s]+)"/;
    $seq        =~ s/\s+//g;

    print F "$gname|$ref|$gid\n$seq\n";
}

close F;




John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to