Re: Memory explodes loading CSV into hash

Stas Bekman Sun, 28 Apr 2002 10:18:56 -0700

Ernest Lergon wrote:
> Hi,
> 
> in a mod_perl package I load a CSV file on apache startup into a simple
> hash as read-only class data to be shared by all childs.
> 
> A loading routine reads the file line by line and uses one numeric field
> as hash entry (error checks etc. omitted):
> 
> package Data;
> 
> my $class_data = {};
> 
> ReadFile ( 'data.txt', $class_data, 4 );
> 
> sub ReadFile
> {
>       my $filename  = shift;  # path string
>       my $data      = shift;  # ref to hash
>       my $ndx_field = shift;  # field number
> 
>       my ( @record, $num_key );
>       local $_;
> 
>       open ( INFILE, "<$filename" );
>       while ( <INFILE> )
>       {
>               chomp;
>               @record = split "\t";
>               $num_key = $record[$ndx_field];
>               $data->{$num_key} = [ @record ];
>       }
>       close ( INFILE );
> }
> 
> sub new...
>       creates an object for searching the data, last result, errors etc.
> 
> sub find...
>       method with something like:
>               if exists $class_data->{$key}   return...
> etc.
> 
> Now I'm scared about the memory consumption:
> 
> The CSV file has 14.000 records with 18 fields and a size of 2 MB
> (approx. 150 Bytes per record).
> 
> Omitting the loading, top shows, that each httpd instance has 10 MB (all
> shared as it should be).
> 
> But reading in the file explodes the memory to 36 MB (ok, shared as
> well)!
> 
> So, how comes, that 2 MB data need 26 MB of memory, if it is stored as a
> hash?
> 
> Reading perldebguts.pod I did not expect such an increasing:
> 
> Description (avg.)         CSV          Perl
> 4 string fields (4 chars)    16 bytes   (32 bytes)  128 bytes
> 9 float fields (5 chars)     45 bytes   (24 bytes)  216 bytes
> 5 string fields (rest)       89 bytes   (32 bytes)  160 bytes
> the integer key                         (20 bytes)   20 bytes
>                             150 bytes               524 bytes


> That will give 14.000 x 524 = approx. 7 MB, but not 26 MB !?
> 
> Lost in space...

Use Apache::Status, which can show you exactly where all the bytes go. 
See the guide or its manpage for more info. I suggest that you 
experiment with a very small data set and look at how much memory each 
record takes.


__________________________________________________________________
Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker
http://stason.org/     mod_perl Guide ---> http://perl.apache.org
mailto:[EMAIL PROTECTED] http://use.perl.org http://apacheweek.com
http://modperlbook.org http://apache.org   http://ticketmaster.com

Re: Memory explodes loading CSV into hash

Reply via email to