[If you think there is better news group than this one where I could get an answer, please tell me.]

I'd like to use Mozilla to make a web crawler that fetches HTML pages (like wget) and extracts all the DOM text nodes from a page. It would be a stand alone application that uses whatever Mozilla component is needed to accomplish this. (Using C++)

I'd like to get some pointers on where to start. Would the GRE be ideal for this task? What library/component should I use to:

- browser a site automatically by folowing links
- Extract the text nodes using Mozilla's internal DOM representation of the page.



Alexandre Leduc



Reply via email to