And Sean M. Burke writes:
 - 
 - I think it'd be trivial for someone who knew the XML modules (i.e.,
 - not I, not just now) to write something that'd return an XML clone of
 - an HTML tree.

You mean an XHTML clone...  ;)  Too many `standards.'

And I know I don't have the time right now.  I shouldn't even be
doing this, but I need it to get someone's mail off of Hotmail.
All the other retrievers suck, so...

BTW, 4DOM for Python has an HTML DOM generator if you want to
look at what they've done.

 - BTW, the DOM makes my head hurt.  Aside from allowing you to
 - manipulate trees the same in any language, I don't see the point of
 - the DOM.  I've got the whole spec here and have been trying to read
 - it, but it's so so foul.

I'm more interested in XPath.  _If_ it's powerful enough to help
in screen-scraping off of web pages, multiple implementations can
pool resources into a common set of XPath statements to do most of
the scraping.  More people with an interest implies more available
maintainers, so the XPath statements will likely be rapidly updated
when the site changes.

And by multiple implementations, I don't just mean implementations 
of the same damn thing in peoples' language of the week.  I mean 
different screen-scrapers with different intents working off the 
same data.

Jason

Reply via email to