Zarana Parekh created TIKA-2021:
-----------------------------------

             Summary: Improving accuracy of Tesseract parser
                 Key: TIKA-2021
                 URL: https://issues.apache.org/jira/browse/TIKA-2021
             Project: Tika
          Issue Type: Improvement
            Reporter: Zarana Parekh


Tesseract OCR parser works well with images containing English text. However, 
there is possibility of improvement in case of alphanumeric and numeric content 
which require training Tesseract with the relevant cases in order to better 
extract content from images. Such a customization can be helpful in extraction 
of serial numbers from images of counterfeit electronics and other applications 
focussing on atypical textual content.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to