Re: Memory explodes loading CSV into hash

Ernest Lergon Wed, 01 May 2002 12:14:03 -0700

Hi Stas,

having a look at Apache::Status and playing around with your tips on


http://www.apacheweek.com/features/mod_perl11

I found some interesting results and a compromising solution:

In a module I load a CSV file as class data into different structures
and compared the output of Apache::Status with top.

Enclosed you'll find a test report.

The code below 'building' shows, how the lines are put into the
structures.

The lines below 'perl-status' show the output of Apache::Status.
The line below 'top' shows the output of top.

Examples for the tested structures are:

$buffer = '1\tr1v1\tr1v2\tr1v3\n2\tr2v1\tr2v2\tr2v3\n' ...

@lines = (
        '1\tr1v1\tr1v2\tr1v3',
        '2\tr2v1\tr2v2\tr2v3',
        ... )

%data = (
        1 => [ 1, 'r1v1' , 'r1v2' , 'r1v3' ],
        2 => [ 2, 'r2v1' , 'r2v2' , 'r2v3' ],
        ... )

$pack = {
        1 => [ 1, 'r1v1' , 'r1v2' , 'r1v3' ],
        2 => [ 2, 'r2v1' , 'r2v2' , 'r2v3' ],
        ... }

%index = (
        1 => '1\tr1v1\tr1v2\tr1v3',
        2 => '2\tr2v1\tr2v2\tr2v3',
        ... )

One thing I realized using Devel::Peek is, that using a hash of
array-ref, each item in the array has the full blown perl flags etc.
That seems to be the reason for the 'memory explosion'.

Another thing I found is, that Apache::Status seems not always report
complete values. Therefore I recorded the sizes from top, too.

Especially for the the hash of array-refs (%data) and the hash-ref of
array-refs ($pack) perl-status reports only a part of the used memory
size: for $pack only the pointer (16 bytes), for %data only the keys
(?).

As compromise I'll use the %index structure. It is small enough while
providing a fast access. A further optimization will be to remove the
redundant key-field from each line.

Success: A reduction from 26 MB to 7 MB - what I estimated in my first
mail.

A last word from perldebguts.pod:

|| Perl is a profligate wastrel when it comes to memory use.  There is a
|| saying that to estimate memory usage of Perl, assume a reasonable
|| algorithm for memory allocation, multiply that estimate by 10, and
|| while you still may miss the mark, at least you won't be quite so
|| astonished.  This is not absolutely true, but may prvide a good grasp
|| of what happens.
||
|| [...]
||
|| Anecdotal estimates of source-to-compiled code bloat suggest an
|| eightfold increase.

Perhaps my experiences could be added to the long row of anecdotes ;-))

Thank you all again for escorting me on this deep dive.

Ernest

--

*********************************************************************
* VIRTUALITAS Inc.               *                                  *
*                                *                                  *
* European Consultant Office     *      http://www.virtualitas.net  *
* Internationales Handelszentrum *   contact:Ernest Lergon          *
* Friedrichstraße 95             *    mailto:[EMAIL PROTECTED] *
* 10117 Berlin / Germany         *       ums:+49180528132130266     *
*********************************************************************
       PGP-Key http://www.virtualitas.net/Ernest_Lergon.asc




TEST REPORT
===========

CSV file:
        14350 records
        CSV     2151045 bytes = 2101 Kbytes
        CSV_2   2136695 bytes = 2086 Kbytes (w/o CR)

1       all empty
=================

building:
        none

perl-status:
        *buffer{SCALAR}           25 bytes
        *lines{ARRAY}             56 bytes
        *data{HASH}              228 bytes
        *pack{SCALAR}             16 bytes
        *index{HASH}             228 bytes

top:
        12992  12M 12844   base

2       buffer
==============

building:
        $buffer .= $_ . "\n";

perl-status:
        *buffer{SCALAR}      2151069 bytes = CSV + 24 bytes
        *lines{ARRAY}             56 bytes
        *data{HASH}              228 bytes
        *pack{SCALAR}             16 bytes
        *index{HASH}             228 bytes

top:
        17200  16M 17040   base + 4208 Kbytes = CSV + 2107 KBytes

3       lines
=============

building:
        push @lines, $_;

perl-status:
        *buffer{SCALAR}           25 bytes
        *lines{ARRAY}        2519860 bytes = CSV_2 + 383165 bytes
                                             (approx. 27 * 14350 )
        *data{HASH}              228 bytes
        *pack{SCALAR}             16 bytes
        *index{HASH}             228 bytes

top:
        18220  17M 18076   base + 5228 Kbytes = CSV_2 + 3142 Kbytes

4       data
============

building:
        @record = split ( "\t", $_ );
        $key = 0 + $record[0];
        $data{$key} = [ @record ];

perl-status:
        *buffer{SCALAR}           25 bytes
        *lines{ARRAY}             56 bytes
        *data{HASH}           723302 bytes = approx. 50 * 14350 ( key +
ref )
                                             (where is the data?)
        *pack{SCALAR}             16 bytes
        *index{HASH}             228 bytes

top:
        40488  38M 39208   base + 27566 Kbytes = CSV_2 + 25480 Kbytes
(!)

5       pack
============

building:
        @record = split ( "\t", $_ );
        $key = 0 + $record[0];
        $pack->{$key} = [ @record ];

perl-status:
        *buffer{SCALAR}           25 bytes
        *lines{ARRAY}             56 bytes
        *data{HASH}              228 bytes
        *pack{SCALAR}             16 bytes (where is the data?)
        *index{HASH}             228 bytes

top:
        40492  39M 40340   base + 27570 Kbytes = CSV_2 + 25484 Kbytes
(!)

6       index
=============

building:
        @record = split ( "\t", $_ );
        $key = 0 + $record[0];
        $index->{$key} = $_;            # !!!

perl-status:
        *buffer{SCALAR}           25 bytes
        *lines{ARRAY}             56 bytes
        *data{HASH}              228 bytes
        *pack{SCALAR}             16 bytes
        *index{HASH}         2989146 bytes = CSV_2 + 852448 bytes
                                             ( approx. 59 * 14350 )

top:
        19988  19M 19824   base + 6996 Kbytes = CSV_2 + 4910 Kbytes

EOF

Re: Memory explodes loading CSV into hash

Reply via email to