Zarana Parekh created TIKA-2021:
-----------------------------------
Summary: Improving accuracy of Tesseract parser
Key: TIKA-2021
URL: https://issues.apache.org/jira/browse/TIKA-2021
Project: Tika
Issue Type: Improvement
Reporter: Zarana Parekh
Tesseract OCR parser works well with images containing English text. However,
there is possibility of improvement in case of alphanumeric and numeric content
which require training Tesseract with the relevant cases in order to better
extract content from images. Such a customization can be helpful in extraction
of serial numbers from images of counterfeit electronics and other applications
focussing on atypical textual content.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)