On 2/23/07, Stefano Mazzocchi <[EMAIL PROTECTED]> wrote: > Ever since working on Solvent and Piggy Bank, we have been toying with > the idea of using the same javascript scrapers to power a server-side > headless crawling agent that could perform data extraction and scraping > in a more automated way.
Count me in. :-) > Right now, it does not scrape, but it fetches the URL that you pass it > thru a RESTful web service, it executes the javascript and builds the > DOM, waits 3 seconds and returns you the serialization of the page DOM. For the lack of a reliable "(javascript) rendering done" callback, I presume? > 3) crowbar's web service will also perform query operations on the > resulting DOM directly, for example as a way to obtain links it's > sufficient to ask for the "//A" xpath. This will radically simplify the > architecture of the crawling agents that will driver the fetching frontends. Nice. More info on how to (or where-to-rtfm) would be much appreciated. :-) > There is still a lot of work to be done before I can see people using > this for real, but I wanted to advertise the fact that it's now starting > to function and we have a clear design direction that is much easier and > solid to work with so that other interested parties might come in and > help out. I'm fairly sure I'll join in, here too. Feel free to drop over to the [EMAIL PROTECTED] as and when you find it appropriate. -- / Johan Sundström, http://ecmanaut.blogspot.com/ _______________________________________________ General mailing list [email protected] http://simile.mit.edu/mailman/listinfo/general
