Florent André
Mon, 25 Jan 2010 06:50:36 -0800
Hello, I use the AutoDetectParser.parse(java.io.InputStream stream, org.xml.sax.ContentHandler handler, Metadata metadata).
I use the parse function many times with the same ContentHandler. My problem is : - on each parse, tika send to the contentHandler the "xml header definition" (<?xml version="1.0" encoding="UTF-8"?>) This is a problem for me, because this sending don't allow me to parse the contentHandler with a SAX element (cocoon transformer). For example, after using of tika, my output is : <root> <documentparse id="1" <?xml version="1.0" encoding="UTF-8"?>> <html> ... content from tika </html> <documentparse id="2" <?xml version="1.0" encoding="UTF-8"?>> <html> ... content from tika </html> </documentparse> There is a way to deactivate the xml header sending ? Thanks in advance, ++