On Thu, Jun 2, 2011 at 4:41 PM, venkates <venka...@nt.ntnu.no> wrote:
> On 6/2/2011 2:44 PM, Rob Coops wrote: > >> On Thu, Jun 2, 2011 at 1:28 PM, venkates<venka...@nt.ntnu.no> wrote: >> >> On 6/2/2011 12:46 PM, John SJ Anderson wrote: >>> >>> On Thu, Jun 2, 2011 at 06:41, venkates<venka...@nt.ntnu.no> wrote: >>>> >>>> Hi, >>>>> >>>>> I want to parse a file with contents that looks as follows: >>>>> >>>>> [ snip ] >>>> >>>> Have you considered using this module? -> >>>> <http://search.cpan.org/dist/BioPerl/Bio/SeqIO/kegg.pm> >>>> >>>> Alternatively, I think somebody on the BioPerl mailing list was >>>> working on another KEGG parser... >>>> >>>> chrs, >>>> j. >>>> >>>> I am doing this as an exercise to learn parsing techniques so guidance >>>> >>> help needed. >>> >>> Aravind >>> >>> >>> >>> -- >>> To unsubscribe, e-mail: beginners-unsubscr...@perl.org >>> For additional commands, e-mail: beginners-h...@perl.org >>> http://learn.perl.org/ >>> >>> >>> >>> This is a simple and ugly way of parsing your file: >> >> use strict; >> use warnings; >> use Carp; >> use Data::Dumper; >> >> my $set = parse("ko"); >> >> sub parse { >> my $keggFile = shift; >> my $keggHash; >> >> my $counter = 1; >> >> open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile': >> $!"); >> while (<$fh> ) { >> chomp; >> if ( $_ =~ m!///! ) { >> $counter++; >> next; >> } >> >> if ( $_ =~ /^ENTRY\s+(.+?)\s/sm ) { ${$keggHash}{$counter} = { 'ENTRY' >> => >> $1 }; } >> > While trying a similar thing for DEFINITION record, instead of appending > current hash with ENTRY and NAME, the DEFINITION record replaces the > contents in the hash? > > $VAR1 = { > '4' => { > 'DEFINITION' => 'U18 small nucleolar RNA' > }, > '1' => { > 'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]' > }, > '3' => { > 'DEFINITION' => 'U14 small nucleolar RNA' > }, > '2' => { > 'DEFINITION' => 'alcohol dehydrogenase (NADP+) > [EC:1.1.1.2]' > }, > '5' => { > 'DEFINITION' => 'U24 small nucleolar RNA' > } > }; > > code: in addition to what you had suggested - > if($_ =~ /^DEFINITION\s{2}(.+)?/){ > ${$keggHash}{$counter} = {'DEFINITION' => $1}; > > } > >> if ( $_ =~ /^NAME\s+(.*)$/sm ) { >> my $temp = $1; >> $temp =~ s/,\s/,/g; >> my @names = split /,/, $temp; >> push @{${$keggHash}{$counter}{'NAME'}}, @names; >> } >> } >> close $fh; >> print Dumper $keggHash; >> } >> >> The output being: >> >> $VAR1 = { >> '1' => { >> 'NAME' => [ >> 'E1.1.1.1', >> 'adh' >> ], >> 'ENTRY' => 'K00001' >> }, >> '3' => { >> 'NAME' => [ >> 'U18snoRNA', >> 'snR18' >> ], >> 'ENTRY' => 'K14866' >> }, >> '2' => { >> 'NAME' => [ >> 'U14snoRNA', >> 'snR128' >> ], >> 'ENTRY' => 'K14865' >> } >> }; >> >> Which to me looks sort of like what you are looking for. >> The main thing I did was read the file one line at a time to prevent a >> unexpectedly large file from causing memory issues on your machine (in the >> end the structure that you are building will cause enough issues >> when handling a large file. >> >> You already dealt with the Entry bit so I'll leave that open though I >> slightly changed the regex but nothing spectacular there. >> The Name bit is simple as I just pull out all of them then then remove all >> spaces and split them into an array, feed the array to the hash and hop >> time >> for the next step which is up to you ;-) >> >> I hope it helps you a bit, regards, >> >> Rob >> >> > What you do: ${$keggHash}{$counter} = {'DEFINITION' => $1}; Try the following: $keggHash}{$counter}{'DEFINITION'} = $1; To make things a little clearer look at the following example. my %hash; $hash{'Key 1'} = { 'Nested Key 1' => 'Value 1' }; What you do is say: $hash{'Key 1'} = { 'Nested Key 2' => 'Value 2' } What I do is: $hash{'Key 1'}{'Nested Key 2'} = 'Value 2'} In your script you will end up with the following: $VAR1 = { 'Key 1' => { 'Nested Key 2' => 'Value 2', }, }; Where mine will result in: $VAR1 = { 'Key 1' => { 'Nested Key 1' => 'Value 1', 'Nested Key 2' => 'Value 2', }, }; Not that much different but you are basically over writting the value ( {NAME=>[], ENTRY=>''} ) associated with your key ($counter) with { 'DESCRIPTION' => ''}. If you instead add a new key to the hash that is associated with your main key ($counter) then you will get the result you are looking for. Regards, Rob