On 21 September 2015 at 21:42, David Ennis wrote: The challenge you will have is that the HTML in the CDATA is likely indexed > as text, so the feature listed needs to be on the element containing the > CDATA.. >
You can use xdmp:tidy() for that. It does a good job for recovery (in cases the HTML is really bad). The only time I had it fail to recover really bad HTML, was when the input contained control characters (which we could remove by acting on the binary or string input, before calling xdmp:tidy().) https://docs.marklogic.com/xdmp:tidy Depending on what you do exactly, you might want the tidied HTML to replace the original one, or rather to sit aside it, so you can send the original input exactly as it was. Regards, -- Florent Georges http://fgeorges.org/ http://h2oconsulting.be/
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
