On 21 September 2015 at 21:42, David Ennis wrote:

The challenge you will have is that the HTML in the CDATA is likely indexed
> as text, so the feature listed needs to be on the element containing the
> CDATA..
>

You can use xdmp:tidy() for that.  It does a good job for recovery (in
cases the HTML is really bad).  The only time I had it fail to recover
really bad HTML, was when the input contained control characters (which we
could remove by acting on the binary or string input, before calling
xdmp:tidy().)

https://docs.marklogic.com/xdmp:tidy

Depending on what you do exactly, you might want the tidied HTML to replace
the original one, or rather to sit aside it, so you can send the original
input exactly as it was.

Regards,

-- 
Florent Georges
http://fgeorges.org/
http://h2oconsulting.be/
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to