Tim Allison created TIKA-3346:
---------------------------------

             Summary: Parsers should only appear once in the "parsed by" 
metadata value
                 Key: TIKA-3346
                 URL: https://issues.apache.org/jira/browse/TIKA-3346
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


[~peterkronenberg] noted on the user list that with the reworking of the 
integration of the ocr parser, the default parser and the TesseractOCRParser 
are entered for every page in a PDF.  This symptom only happens with "inline" 
ocr'ing.  We should limit adding new parsers to the "X-TIKA-ParsedBy" to a 
unique list to avoid duplication.

If anyone has a better option, let me know.  I was thinking about sending in a 
dummy metadata object but that got messy quickly...



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to