Convert HTML to XHTML with namespace prefix using Neko + Xerces

Jan Uhlir Thu, 27 Apr 2006 07:06:19 -0700

Alias: How to force a default namespace to use prefix

Sorry if I missed something important, I'm quite new to namespace problematics.
But I'm deadlocked at the last point to solve of the whole transformation 
process.


Everything works nice, except that XHTML namespace is set as default namespace, 
so no prefixes, preferably 'html' prefix, is not included in element names when 
serialized back to string.

I'm getting:
<html xmlns="http://www.w3.org/1999/xhtml";>
<body> some <b> bold </b> text </body>
</html>

But I need:
<html xmlns:html="http://www.w3.org/1999/xhtml";>
<html:body> some <html:b> bold </html:b> text </html:body>
</html>

Why? Because in reality I pick peaces of html - often corrupt! - from database 
transforming them to valid xhtml and finally assemble them into another, bigger 
XML, with multiple namespaces.  Indeed, I build RSS/Atom feed.

So my question is like:
how to force a default namespace to use prefix. 
Is this relevant to parser or serializer (transformer)?
how to pick a prefix name for namespace. Preferably 'html'.

Here is my code:

// set up Neko parser, set html tag fixing routines and namespaces on
org.cyberneko.html.parsers.DOMParser parser = new DOMParser();

parser.setFeature(
   "http://cyberneko.org/html/features/balance-tags";, true);
parser.setProperty(
   "http://cyberneko.org/html/properties/names/elems";, "lower");
parser.setFeature(
   "http://cyberneko.org/html/features/override-namespaces";, 
   true);
parser.setFeature(
   "http://cyberneko.org/html/features/insert-namespaces";,
    true);
parser.setProperty(
   "http://cyberneko.org/html/properties/namespaces-uri";,
   "http://www.w3.org/1999/xhtml";);
            
// parse html fragment, fix it and return full and valid XML document
parser.parse(
   new InputSource(
   new StringReader(htmlFragment)));
return  parser.getDocument();

// ..OK, let's serialize it back to string!

// prepare serializer
StringWriter sw = new StringWriter();
Transformer t = TransformerFactory.newInstance()
  .newTransformer();
t.setOutputProperty(OutputKeys.METHOD, "xml");
t.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");

// Serialize DOM tree
t.transform(new DOMSource(node),new StreamResult(sw));
String outputXHTML = sw.toString();

P.S.
NekoHTML parser is a real treasure! Helping much with closing html 
tags, misballanced tags etc. Thanks Andy.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Convert HTML to XHTML with namespace prefix using Neko + Xerces

Reply via email to