Tim Allison created TIKA-3346:
---------------------------------
Summary: Parsers should only appear once in the "parsed by"
metadata value
Key: TIKA-3346
URL: https://issues.apache.org/jira/browse/TIKA-3346
Project: Tika
Issue Type: Task
Reporter: Tim Allison
[~peterkronenberg] noted on the user list that with the reworking of the
integration of the ocr parser, the default parser and the TesseractOCRParser
are entered for every page in a PDF. This symptom only happens with "inline"
ocr'ing. We should limit adding new parsers to the "X-TIKA-ParsedBy" to a
unique list to avoid duplication.
If anyone has a better option, let me know. I was thinking about sending in a
dummy metadata object but that got messy quickly...
--
This message was sent by Atlassian Jira
(v8.3.4#803005)