Tim Allison created TIKA-2169:
---------------------------------
Summary: Fix xhtml in combination OCR+metadata extraction from
images
Key: TIKA-2169
URL: https://issues.apache.org/jira/browse/TIKA-2169
Project: Tika
Issue Type: Bug
Reporter: Tim AllisonIn trunk, I'm getting an embedded html entity for the image's metadata when Tesseract is available: <html> ocr content <html> ...metadata </html> </html> -- This message was sent by Atlassian JIRA (v6.3.4#6332)
