Tim Allison created TIKA-3298:
---------------------------------

             Summary: Add a "preloadLangs" parameter to TesseractOCRParser
                 Key: TIKA-3298
                 URL: https://issues.apache.org/jira/browse/TIKA-3298
             Project: Tika
          Issue Type: Task
            Reporter: Tim Allison


[~peterkronenberg] on the user/dev lists and on TIKA-3297 and TIKA-3296 has 
observed that the tesseract error message for "lang data doesn't exist" is not 
extremely clear.  We could add a "preloadLangs" option to TesseractOCRParser 
(default would be {{false}}).  If set to true, the parser (upon initialization) 
if it finds tesseract, will call {{tesseract --list-langs}} and then store 
those langs.  At parse time, if the langs set has anything in it, the 
TesseractOCRParser will check that set against the user-requested language and 
throw a clearer exception to the user that the language data doesn't exist for 
the requested language.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to