Mozilla crawler

Alex Leduc Thu, 03 Jul 2003 09:33:33 -0700

[If you think there is better news group than this one where I could get an answer, please tell me.]

I'd like to use Mozilla to make a web crawler that fetches HTML pages (like wget) and extracts all the DOM text nodes from a page. It would be a stand alone application that uses whatever Mozilla component is needed to accomplish this. (Using C++)

I'd like to get some pointers on where to start. Would the GRE be ideal for this task? What library/component should I use to:

- browser a site automatically by folowing links - Extract the text nodes using Mozilla's internal DOM representation of the page.

Alexandre Leduc

Mozilla crawler

Reply via email to