Hi,
I noticed a different behavior concerning the treatment of an XHTML document
between the /unpack endpoint and the /rmeta endpoint on Tika Server v1.27
(in auto detect)
My input document is an XHTML document containing HTML escaped & (so &
;), and the resulting output of the /unpack endpoint is a text with
unescaped & where the output of the /rmeta endpoint is a text still
containing the escaped form & ;
I am wondering if it is a normal behavior or not ?
It can be easily tested with a simple test.html file containing :
<html>
<body>
Parse & extract
</body>
</html>
Regards,
Julien Massiera
Responsable produit
France Labs Makers of <https://www.datafari.com/en> Datafari Enteprise
Search
Datafari Enterprise Search - Retrouvez-nous à
<https://www.opensource-experience.com/> Open Source eXPerience 2021 les 9
et 10 novembre
<https://www.opensource-experience.com/>