[
https://issues.apache.org/jira/browse/TIKA-2581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ewan Mellor updated TIKA-2581:
------------------------------
Description:
TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0.
With 3.x, the output is <span>Happy</span> but with 4.0 the output is
<span><strong>Happy</strong></span>.
was:
TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0.
With 3.x, the output is `<span>Happy</span>` but with 4.0 the output is
`<span><strong>Happy</strong></span>`.
> testOCROutputsHOCR fails with Tesseract 4.0
> -------------------------------------------
>
> Key: TIKA-2581
> URL: https://issues.apache.org/jira/browse/TIKA-2581
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.17
> Reporter: Ewan Mellor
> Priority: Minor
>
> TesseractOCRParserTest.testOCROutputsHOCR fails with Tesseract 4.0.
> With 3.x, the output is <span>Happy</span> but with 4.0 the output is
> <span><strong>Happy</strong></span>.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)