Dave Meikle created TIKA-1477:
---------------------------------

             Summary: Add customer header to allow overriding of OCR language 
to be used in Tika Server
                 Key: TIKA-1477
                 URL: https://issues.apache.org/jira/browse/TIKA-1477
             Project: Tika
          Issue Type: Bug
          Components: server
            Reporter: Dave Meikle
            Assignee: Dave Meikle
            Priority: Minor
             Fix For: 1.7


The _TesseractOCRParser_ relies on different language models to accurately OCR 
content written in different languages.  At present, the Tika Server provides 
no way to specify additional specific languages without code changes.

To enable clients to ask for processing to be performed using specific language 
models, we should add an optional new custom HTTP header (e.g. 
X-Tika-OCRLanguage) which will override the TesseractOCRConfig language value 
and set it on the ParseContext for use during parsing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to