All,

I've run into a problem with treebuilder when processing large pages.  My
process size jumps enormously, e.g. an 8MB process increases to 72MB when
processing a 2.6MB web page, but when finished that memory is not released.
(Is this an artifact of perl's memory management architecture?)
Furthermore, another large page retrieval will result in more process
growth, although not as much the first time.  By commenting out each
function call in the test script I've been able to pinpoint by far the
largest amount of memory growth to the buildtree() function call which calls
HTML::TreeBuilder.  (Test script and sample output may be found at the end
of this message.)

Is there a way to get this memory back after processing a large page?  Does
perl have a way to force garbage collection ala Java?

I've seen this behavior under perl 5.003 (RH Linux 6.1), 5.6 (RH Linux 7.0)
and 5.6.1 (RH Linux 7.1).

Curt
-------------------------

#!/usr/bin/perl
#usage: ./memtest < input_file

sub formattext  # called by buildtree()
{
        use HTML::FormatText;
        my $html = shift;
        my $formatter = HTML::FormatText->new(leftmargin=>0, rightmargin=>250);
        my $ascii = $formatter->format($html);
}
sub buildtree()  #called by geturllength()
{
        my $Response = shift;
        use HTML::TreeBuilder;
        my $html = HTML::TreeBuilder->new();
        $html->parse($Response->content);
        &formattext($html);
        $html = $html->delete;
}

sub geturllength()  #called by main loop
{
        use LWP::UserAgent;
        use HTTP::Request;
        my $URL = shift;
        my $UA = LWP::UserAgent->new();
        my $Request = HTTP::Request->new(GET => $URL);
        my $Response = $UA->request($Request);
        print "Error retrieving $URL\n" if ($Response->is_error());
        &buildtree($Response);
        return length($Response->as_string);
}

# return amount of memory used
sub memused
{
        local *memused_TMP_FILE;
        open(memused_TMP_FILE, "</proc/$$/stat") or return "N/A";
        my $a = <memused_TMP_FILE>;
        close memused_TMP_FILE;
        my @b = split(' ', $a);
        return $b[22];
}

print "resp. lngth\tMem. used\tChange\tURL retrieved\n";
while (<STDIN>)
{
        chomp $_;
        $length = &geturllength($_);
        sleep 5; # wait for proc file to be updated?
        $used = &memused();
        $delta = $used - $lastused;
        print "$length\t$used\t$delta\t$_\n";
        $lastused = $used;
}

-----------------

results from arbitrarily selected web pages:

resp. lngth     Mem. used       Change  URL retrieved
204210  8081408 8081408 http://www.iawa.org/members.html
119468  9183232 1101824 http://www.iaw.on.ca/~fridguy/cgi-bin/db.cgi
345981  11227136        2043904 http://www.ibabowl.com/LocalData.htm
123037  11137024        -90112  http://www.ibac.org/Bulletins/ibac_b00-2.htm
2641177 82894848        71757824
http://www.furman.edu/admin/alumni/registry/visitors2.html
2580581 82657280        -237568
http://www.furman.edu/admin/alumni/registry/visitors.html
164452  75087872        -7569408
http://www.i-base.org.uk/publications/bulletins/htb2/htb2.html
152871  74932224        -155648 http://www.ibasis.net/news/pr01302001.htm
463743  75554816        622592  http://www.ibasis.net/news/pr07192000a.htm
100676  74719232        -835584 http://www.ibat.org/Vend2.htm
170515  74891264        172032  http://www.ibb.hr/komponente.html
188279  74809344        -81920  http://www.ibcmc.com/browsedb2.asp
123066  74866688        57344
http://www.ibcsports.com/west_virginia_state_bb_2_27.htm
Error retrieving http://www.ibegcom.com/company.htm
120     74620928        -245760 http://www.ibegcom.com/company.htm
Error retrieving http://www.ibegcom.com/units.htm
120     74620928        0       http://www.ibegcom.com/units.htm
159397  74780672        159744
http://www.iberbyte.es/iberbyte/F_Productos_todos.html
140987  74764288        -16384  http://www.ibertel.com/atlantis/tarcon.html
199722  75022336        258048
http://www.ibfnet.de/katalog/software/softwarekommunikation.htm
125819  74870784        -151552 http://www.ib.hu-berlin.de/~wumsta/uk/plan.html
103928  74723328        -147456 http://www.ibia.org/news.htm
135435  74891264        167936  http://www.ibia.org/policy.htm
121747  75218944        327680
http://www.ibiblio.org/london/agriculture/faqs/1/msg00027.html
200740  75657216        438272
http://www.ibiblio.org/london/permaculture/mailarchives/sanet2/maillist.html
149005  75476992        -180224
http://www.ibiblio.org/london/permaculture/mailarchives/sanet2/threads.html
121177  75587584        110592
http://www.ibiblio.org/pub/academic/agriculture/agronomy/AGMODELS-L/199602xx
.agm.html
197070  75722752        135168
http://www.ibiblio.org/pub/academic/agriculture/agronomy/AGMODELS-L/log9503.
agmodels-l.html
117475  75325440        -397312 http://www.ibisnet.org/200102/index.html
106425  75325440        0       http://www.iblcham.ch/promo/market.htm
208249  75743232        417792  http://www.ibl.com/worldinfo/appc.html
203534  75530240        -212992
http://www.ibl.com/writerinfo/caribbean/dominicanrepublic.htm
134846  75567104        36864
http://www.ibmlink.ibm.com/cgi-bin/master?xu=guest&xp=&xh=logon&request=anno
uncements&parms=G_294-519
100902  75427840        -139264 http://www.ibmlink.ibm.com/usalets&parms=H_200-288
117587  75325440        -102400 http://www.ibmlink.ibm.com/usalets&parms=H_299-023
161490  75489280        163840  http://www.ibo-ny.com/members3.htm
135041  75325440        -163840 http://www.ibo-ny.com/members4.htm
110164  75436032        110592  http://www.ibo-ny.com/members.htm
115960  75563008        126976  http://www.ibpinetsp.com.br/rede/not_informe.html
147154  75845632        282624  http://www.ibpmt.com/search_0.htm
103657  75325440        -520192 http://www.ibrc.indiana.edu/affiliates.html
133738  75460608        135168  http://www.i-b-r.org/ir00020b.htm
128832  75325440        -135168 http://www.ibss.iuf.net/common/irsbabs.html
147379  75575296        249856  http://www.ibt.ku.dk/nsfk/Newsletter/nk242.html
121759  75325440        -249856 http://www.ibt.ku.dk/nsfk/Newsletter/nk243.htm
151322  75755520        430080  http://www.ibt.ku.dk/nsfk/Newsletter/nk251.htm
111194  75452416        -303104 http://www.ibt.ku.dk/nsfk/Newsletter/nk252.htm
140327  75730944        278528  http://www.ibt.ku.dk/nsfk/newsletter/nk253.htm
142067  75325440        -405504 http://www.ibt.ku.dk/nsfk/newsletter/nk261.htm
189773  75722752        397312  http://www.ibunka.com/translation/dataj.html
107048  75325440        -397312 http://www.ibuyer.net/rate_list.html?cid=338
271704  76124160        798720  http://www.ibw.com.ni/~chiste/
134602  75403264        -720896 http://www.icamo.ind.br/revend_sudeste.htm
118548  75444224        40960
http://www.icann.org/correspondence/cerf-testimony-08feb01.htm
136866  75583488        139264
http://www.icann.org/registrars/accreditation-qualified-list.html
2595444 80711680        5128192 http://www.icann.org/tlds/africa1/APPLICATION  AND
REGISTRY OPERATOR'S PROPOSAL.htm

Reply via email to