Thanks for the tips, folks. Sounds like if I don't want to modify the
content of the HTML I return, I will need to store two copies - one to
search on and one to return - correct? Might be time to increase the size
of our SAN...

On Tue, Sep 22, 2015 at 3:23 AM Florent Georges <[email protected]> wrote:

> On 21 September 2015 at 21:42, David Ennis wrote:
>
> The challenge you will have is that the HTML in the CDATA is likely
>> indexed as text, so the feature listed needs to be on the element
>> containing the CDATA..
>>
>
> You can use xdmp:tidy() for that.  It does a good job for recovery (in
> cases the HTML is really bad).  The only time I had it fail to recover
> really bad HTML, was when the input contained control characters (which we
> could remove by acting on the binary or string input, before calling
> xdmp:tidy().)
>
> https://docs.marklogic.com/xdmp:tidy
>
> Depending on what you do exactly, you might want the tidied HTML to
> replace the original one, or rather to sit aside it, so you can send the
> original input exactly as it was.
>
> Regards,
>
> --
> Florent Georges
> http://fgeorges.org/
> http://h2oconsulting.be/
>
>
> _______________________________________________
> General mailing list
> [email protected]
> Manage your subscription at:
> http://developer.marklogic.com/mailman/listinfo/general
>
_______________________________________________
General mailing list
[email protected]
Manage your subscription at: 
http://developer.marklogic.com/mailman/listinfo/general

Reply via email to