Well, as the perpetrator of that bit of hackery, I can certainly
explain why it gets broken if you let the head object go away.
A node knows its parent through a weak reference, and its offset/
length in the original parsed string. The top object owns the parsed
string.
When a node tries to print itself it traverses the parents to get the
original text buffer and then takes the appropriate substring out of
it and prints that.
This was really useful during debugging since I could see exactly what
hunk of text each node thought it represented (especially since the
nodes parse themselves). Reprinting the document should reproduce the
original text buffer or something is wrong somewhere. So that makes
for a cheap and cheerful integrity check.
Anyhow, it is possible that making the parent weak was perhaps not a
great choice but it was meant to make some DOM editing operations
easier in the future (anticipating possible javascript integration).
Two fixes/workarounds. Either never let go of the root, or change the
parent code in parsed node to use strong references. It amounts to
the same thing.
On Jul 30, 2008, at 7:38 AM, Marcin Tustin wrote:
Hello everyone, a slightly involved and multi-part question:
I'm using the package at http://www.squeaksource.com/htmlcssparser
(HTML/CSS Parser, or "the parser") to scrape multiple pages (in fact
about two or three a day, and about a thousand existing pages), so I
can extract parts of them to put into an rss feed. If I let the root
object for a parse (the Validator's dom object) be garbage
collected, none of the rest of the parse tree really works (because
then other objects only referred to weakly get collected, AFAICT).
So, my first question is whether there's a way to assess what kind
of memory overhead there would be for keeping each of these objects
hanging around indefinitely.
My second is whether anyone has any advice for another way to do it
- by using a different parser, or by copying the data into a
different structure somehow, or something else.
_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners
_______________________________________________
Beginners mailing list
Beginners@lists.squeakfoundation.org
http://lists.squeakfoundation.org/mailman/listinfo/beginners