[
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Matthew Caruana Galizia updated TIKA-2174:
------------------------------------------
Description:
A complete install of Leptonica with Tesseract will add support for formats
that are not declared by TesseractOCRParser. These include JP2, JPX and PPM.
Tesseract produces OCR output fine for JPX images as of this version:
{noformat}
$ tesseract -v
tesseract 3.04.01
leptonica-1.73
libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
{noformat}
However, these types are not declared by getSupportTypes so no output is
produced for PDFs which contained JPX images of scanned documents, for example.
was:
Tesseract produces OCR output fine for JPX images as of this version:
{noformat}
$ tesseract -v
tesseract 3.04.01
leptonica-1.73
libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
{noformat}
However, these types are not declared by getSupportTypes so no output is
produced for PDFs which contained JPX images of scanned documents, for example.
Summary: Too few formats in support declared by TesseractOCRParser
(was: JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser)
> Too few formats in support declared by TesseractOCRParser
> ---------------------------------------------------------
>
> Key: TIKA-2174
> URL: https://issues.apache.org/jira/browse/TIKA-2174
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.14
> Reporter: Matthew Caruana Galizia
>
> A complete install of Leptonica with Tesseract will add support for formats
> that are not declared by TesseractOCRParser. These include JP2, JPX and PPM.
> Tesseract produces OCR output fine for JPX images as of this version:
> {noformat}
> $ tesseract -v
> tesseract 3.04.01
> leptonica-1.73
> libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
> {noformat}
> However, these types are not declared by getSupportTypes so no output is
> produced for PDFs which contained JPX images of scanned documents, for
> example.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)