Then, is there a non-leaking alternative (short of re-implementing html parsing manually...) to HTML::Parse/HTML::FormatText/Sys::AlarmCall (I started using the latter so that the script would not get stuck whenever it found a page it had problems downloading... and I can see how this could be the cause of the leaks...)?
I can live with the script as it is -- it does its job more or less fine, and it's ok to have to reboot the computer after it runs -- but I am curious about what would be a *better* approach to the problem of downloading lots of text from the net.
Thanks again,
Cheers,
Marco
At 12:21 Uhr +0100 22.03.2003, Marco Baroni wrote:Speaking of top, another thing I noticed last night after the script had been running for a few hours was that the script was taking up a huge amount of memory, like more than 500M of RSIZE, and this size seemed to be constantly increasing... this surprised me, since the script is not doing anything that, in my naive view, would require progressively larger memory chunks...
use HTML::Parse; use HTML::FormatText; use Sys::AlarmCall;
There are two possible problem spots: 1) HTML::Parse uses a tree containing circularities internally. If it doesn't wrap that with a destructor layer, it will leak (I'm too lazy to check myself). 2) Sys::AlarmCall uses string eval internally, which is a rather bad idea from a performance point of view (and possibly security, and it also does not rethrow exceptions, enough that I won't use it), and could maybe show up perl problems. (I believe that the memory leak bug that was, in context of mod_perl (/AxKit) some time ago, attested to Error.pm, was really a perl core bug solved recently (didn't have the time to check or discuss that yet).)
Christian.