Thanks for the tips, folks. Sounds like if I don't want to modify the content of the HTML I return, I will need to store two copies - one to search on and one to return - correct? Might be time to increase the size of our SAN...
On Tue, Sep 22, 2015 at 3:23 AM Florent Georges <[email protected]> wrote: > On 21 September 2015 at 21:42, David Ennis wrote: > > The challenge you will have is that the HTML in the CDATA is likely >> indexed as text, so the feature listed needs to be on the element >> containing the CDATA.. >> > > You can use xdmp:tidy() for that. It does a good job for recovery (in > cases the HTML is really bad). The only time I had it fail to recover > really bad HTML, was when the input contained control characters (which we > could remove by acting on the binary or string input, before calling > xdmp:tidy().) > > https://docs.marklogic.com/xdmp:tidy > > Depending on what you do exactly, you might want the tidied HTML to > replace the original one, or rather to sit aside it, so you can send the > original input exactly as it was. > > Regards, > > -- > Florent Georges > http://fgeorges.org/ > http://h2oconsulting.be/ > > > _______________________________________________ > General mailing list > [email protected] > Manage your subscription at: > http://developer.marklogic.com/mailman/listinfo/general >
_______________________________________________ General mailing list [email protected] Manage your subscription at: http://developer.marklogic.com/mailman/listinfo/general
