Hello All,

I have just started working on a project involved with dynamic
web-scraping (getting data from webpages) in situtations where there is
no defined XML interface. This involves the inspection of several
complex sites that use javascript to generate the data that needs to be
scraped. User interation is also critical as in some cases the user
clicking a button causes javascript to be executed.

Thes structure and content of such site often change, so essentially, I
need to build a system that allows a user to easilly define a
webscraping tasks without any programming.

However to do this I must obtain the final DOM of web pages after the
application of javascript and user input. I was hoping to use
Mozilla/Gecko/WebClient to generate the DOM for a specified target URL
and then somehow pass it to Java. Is this possible using Mozilla
technologies? Is anyone aware of the status of WebClient, and is this
project continuing? Are there any other interesting pointers,
experiences or ideas on this topic?

Thanks for all your time,

Tim

_______________________________________________
mozilla-embedding mailing list
[EMAIL PROTECTED]
http://mail.mozilla.org/listinfo/mozilla-embedding

Reply via email to