On 8/6/05, Bertrand Delacrétaz <[EMAIL PROTECTED]> wrote: ... > Cocoon (http://cocoon.apache.org) will allow you to build pipelines to > parse the HTML (using JTidy or the NekoHTML parser), process it via > XSLT transforms to clean it up and feed it to java objects for storage, > or go directly to SQL statements via its SQLTransformer which executes > SQL statements embedded in XML documents.
I like the idea of using JTidy since that's what I'm most familiar with. I'll take a look at Cocoon. The immediate goal is to get the HTML inserted into a database by any means. > An alternative, especially if it's a one-off job, would be to build > your own pipeline using NekoHTML, Xalan, and commons Digester or > another XML-to-beans mapper to build your java objects, using ant to > combine these tools. > > -Bertrand Where does Xalan fit into this? Xalan is an XLST processor, but what does that really mean? Xalan is the "engine" which does the actual transform from HTML to XML, based on what the XSLT form specifies? I'm trying to find a "transforms 101" manual or example where some XLST is used to transform HTML to XML. I imagine that this isn't so unusual. If "Xalan-Java is an XSLT processor for transforming XML documents into HTML, text, or other XML document types." then don't I want the inverse of Xalan, HTML to XML? is that Xerces? Thanks, Thufir --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]