All,
I've run into a problem with treebuilder when processing large pages. My
process size jumps enormously, e.g. an 8MB process increases to 72MB when
processing a 2.6MB web page, but when finished that memory is not released.
(Is this an artifact of perl's memory management architecture?)
Furthermore, another large page retrieval will result in more process
growth, although not as much the first time. By commenting out each
function call in the test script I've been able to pinpoint by far the
largest amount of memory growth to the buildtree() function call which calls
HTML::TreeBuilder. (Test script and sample output may be found at the end
of this message.)
Is there a way to get this memory back after processing a large page? Does
perl have a way to force garbage collection ala Java?
I've seen this behavior under perl 5.003 (RH Linux 6.1), 5.6 (RH Linux 7.0)
and 5.6.1 (RH Linux 7.1).
Curt
-------------------------
#!/usr/bin/perl
#usage: ./memtest < input_file
sub formattext # called by buildtree()
{
use HTML::FormatText;
my $html = shift;
my $formatter = HTML::FormatText->new(leftmargin=>0, rightmargin=>250);
my $ascii = $formatter->format($html);
}
sub buildtree() #called by geturllength()
{
my $Response = shift;
use HTML::TreeBuilder;
my $html = HTML::TreeBuilder->new();
$html->parse($Response->content);
&formattext($html);
$html = $html->delete;
}
sub geturllength() #called by main loop
{
use LWP::UserAgent;
use HTTP::Request;
my $URL = shift;
my $UA = LWP::UserAgent->new();
my $Request = HTTP::Request->new(GET => $URL);
my $Response = $UA->request($Request);
print "Error retrieving $URL\n" if ($Response->is_error());
&buildtree($Response);
return length($Response->as_string);
}
# return amount of memory used
sub memused
{
local *memused_TMP_FILE;
open(memused_TMP_FILE, "</proc/$$/stat") or return "N/A";
my $a = <memused_TMP_FILE>;
close memused_TMP_FILE;
my @b = split(' ', $a);
return $b[22];
}
print "resp. lngth\tMem. used\tChange\tURL retrieved\n";
while (<STDIN>)
{
chomp $_;
$length = &geturllength($_);
sleep 5; # wait for proc file to be updated?
$used = &memused();
$delta = $used - $lastused;
print "$length\t$used\t$delta\t$_\n";
$lastused = $used;
}
-----------------
results from arbitrarily selected web pages:
resp. lngth Mem. used Change URL retrieved
204210 8081408 8081408 http://www.iawa.org/members.html
119468 9183232 1101824 http://www.iaw.on.ca/~fridguy/cgi-bin/db.cgi
345981 11227136 2043904 http://www.ibabowl.com/LocalData.htm
123037 11137024 -90112 http://www.ibac.org/Bulletins/ibac_b00-2.htm
2641177 82894848 71757824
http://www.furman.edu/admin/alumni/registry/visitors2.html
2580581 82657280 -237568
http://www.furman.edu/admin/alumni/registry/visitors.html
164452 75087872 -7569408
http://www.i-base.org.uk/publications/bulletins/htb2/htb2.html
152871 74932224 -155648 http://www.ibasis.net/news/pr01302001.htm
463743 75554816 622592 http://www.ibasis.net/news/pr07192000a.htm
100676 74719232 -835584 http://www.ibat.org/Vend2.htm
170515 74891264 172032 http://www.ibb.hr/komponente.html
188279 74809344 -81920 http://www.ibcmc.com/browsedb2.asp
123066 74866688 57344
http://www.ibcsports.com/west_virginia_state_bb_2_27.htm
Error retrieving http://www.ibegcom.com/company.htm
120 74620928 -245760 http://www.ibegcom.com/company.htm
Error retrieving http://www.ibegcom.com/units.htm
120 74620928 0 http://www.ibegcom.com/units.htm
159397 74780672 159744
http://www.iberbyte.es/iberbyte/F_Productos_todos.html
140987 74764288 -16384 http://www.ibertel.com/atlantis/tarcon.html
199722 75022336 258048
http://www.ibfnet.de/katalog/software/softwarekommunikation.htm
125819 74870784 -151552 http://www.ib.hu-berlin.de/~wumsta/uk/plan.html
103928 74723328 -147456 http://www.ibia.org/news.htm
135435 74891264 167936 http://www.ibia.org/policy.htm
121747 75218944 327680
http://www.ibiblio.org/london/agriculture/faqs/1/msg00027.html
200740 75657216 438272
http://www.ibiblio.org/london/permaculture/mailarchives/sanet2/maillist.html
149005 75476992 -180224
http://www.ibiblio.org/london/permaculture/mailarchives/sanet2/threads.html
121177 75587584 110592
http://www.ibiblio.org/pub/academic/agriculture/agronomy/AGMODELS-L/199602xx
.agm.html
197070 75722752 135168
http://www.ibiblio.org/pub/academic/agriculture/agronomy/AGMODELS-L/log9503.
agmodels-l.html
117475 75325440 -397312 http://www.ibisnet.org/200102/index.html
106425 75325440 0 http://www.iblcham.ch/promo/market.htm
208249 75743232 417792 http://www.ibl.com/worldinfo/appc.html
203534 75530240 -212992
http://www.ibl.com/writerinfo/caribbean/dominicanrepublic.htm
134846 75567104 36864
http://www.ibmlink.ibm.com/cgi-bin/master?xu=guest&xp=&xh=logon&request=anno
uncements&parms=G_294-519
100902 75427840 -139264 http://www.ibmlink.ibm.com/usalets&parms=H_200-288
117587 75325440 -102400 http://www.ibmlink.ibm.com/usalets&parms=H_299-023
161490 75489280 163840 http://www.ibo-ny.com/members3.htm
135041 75325440 -163840 http://www.ibo-ny.com/members4.htm
110164 75436032 110592 http://www.ibo-ny.com/members.htm
115960 75563008 126976 http://www.ibpinetsp.com.br/rede/not_informe.html
147154 75845632 282624 http://www.ibpmt.com/search_0.htm
103657 75325440 -520192 http://www.ibrc.indiana.edu/affiliates.html
133738 75460608 135168 http://www.i-b-r.org/ir00020b.htm
128832 75325440 -135168 http://www.ibss.iuf.net/common/irsbabs.html
147379 75575296 249856 http://www.ibt.ku.dk/nsfk/Newsletter/nk242.html
121759 75325440 -249856 http://www.ibt.ku.dk/nsfk/Newsletter/nk243.htm
151322 75755520 430080 http://www.ibt.ku.dk/nsfk/Newsletter/nk251.htm
111194 75452416 -303104 http://www.ibt.ku.dk/nsfk/Newsletter/nk252.htm
140327 75730944 278528 http://www.ibt.ku.dk/nsfk/newsletter/nk253.htm
142067 75325440 -405504 http://www.ibt.ku.dk/nsfk/newsletter/nk261.htm
189773 75722752 397312 http://www.ibunka.com/translation/dataj.html
107048 75325440 -397312 http://www.ibuyer.net/rate_list.html?cid=338
271704 76124160 798720 http://www.ibw.com.ni/~chiste/
134602 75403264 -720896 http://www.icamo.ind.br/revend_sudeste.htm
118548 75444224 40960
http://www.icann.org/correspondence/cerf-testimony-08feb01.htm
136866 75583488 139264
http://www.icann.org/registrars/accreditation-qualified-list.html
2595444 80711680 5128192 http://www.icann.org/tlds/africa1/APPLICATION AND
REGISTRY OPERATOR'S PROPOSAL.htm