subject:"\[xwiki\-devs\] what's the best way to scrape an HTML document with Xwiki"

Re: [xwiki-devs] what's the best way to scrape an HTML document with Xwiki

2009-07-05 Thread Niels Mayer

On Thu, Jun 18, 2009 at 11:50 PM, Pascal Voitot pascal.voitot@gmail.com wrote: I agree with Vincent... Groovy is the easiest solution... In the past, I tried another weird solution consisting in integrating a JavaScript rendering engine on the serverside such as rhino... then

Re: [xwiki-devs] what's the best way to scrape an HTML document with Xwiki

2009-06-19 Thread Pascal Voitot

On Thu, Jun 18, 2009 at 9:04 PM, Vincent Massol vinc...@massol.net wrote: Hi Niels, You could easily call $xwiki.getExternalURL() which returns the content at a URL. Then you can use our XHTML parser to generate a XDOM and then do whatever you want with it. Only little issue: the renderer

[xwiki-devs] what's the best way to scrape an HTML document with Xwiki

2009-06-18 Thread Niels Mayer

Is there anything like the Xwiki-feed-plugin except that instead of fetching a feed, it would fetch an HTML document via HTTP, returning a DOM structure that can be scanned or filtered by API-calls, e.g.: $fetchedDom = $xwiki.FetchPlugin.getDocumentDOM(http://nielsmayer.com;) $images =

Re: [xwiki-devs] what's the best way to scrape an HTML document with Xwiki

2009-06-18 Thread Vincent Massol

Hi Niels, You could easily call $xwiki.getExternalURL() which returns the content at a URL. Then you can use our XHTML parser to generate a XDOM and then do whatever you want with it. Only little issue: the renderer is not available in the xwiki content right now. But if you're doing