Hi, I want to parse a file with contents that looks as follows:
ENTRY K00001 KO NAME E1.1.1.1, adh DEFINITION alcohol dehydrogenase [EC:1.1.1.1] PATHWAY ko00010 Glycolysis / Gluconeogenesis ko00071 Fatty acid metabolism /// ENTRY K14865 KO NAME U14snoRNA, snR128 DEFINITION U14 small nucleolar RNA CLASS Genetic Information Processing; Translation; Ribosome Biogenesis [BR:ko03009] /// ENTRY K14866 KO NAME U18snoRNA, snR18 DEFINITION U18 small nucleolar RNA CLASS Genetic Information Processing; Translation; Ribosome Biogenesis [BR:ko03009] /// each record ends with "///". The ultimate aim is to store information from each record (for instance ENTRY, NAME) in a data structure (hash) such as (ENTRY => K14865; NAME => [U14snoRNA, snR128]... so on) so to start of I have produced the following snippet: use strict; use warnings; use Carp; use Data::Dumper; my $set = &parse("D:/workspace/KEGG_Parser/data/ko"); sub parse { my $keggFile = shift; my $keggHash; open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile': $!"); my $contents = do {local $/; <$fh>}; my @rec = split ('///', $contents); foreach my $line (@{rec}){ next if ($line =~ /^\s*$/); if ($line =~ /^ENTRY\s{7}(.+?)\s+/){ $keggHash->{'ENTRY'}= $1; } elsif ($line =~ /^NAME\s{8}(.+?)$/){ push @{$keggHash->{'NAME'}}, $1; } else{} print Dumper($keggHash); close $fh; } The output I get is $VAR1 = { 'ENTRY' => 'K00001' }; Not all the lines in each element of @rec is getting read.I would appreciate if somebody could guide me through this. Thank to all, -- Aravind Venkatesan Research Fellow, Systems Biology Group, Dept. of Biology, NTNU