[ 
https://issues.apache.org/jira/browse/TIKA-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthew Caruana Galizia updated TIKA-2174:
------------------------------------------
    Description: 
A complete install of Leptonica with Tesseract will add support for formats 
that are not declared by TesseractOCRParser. These include JP2, JPX and PPM.

Tesseract produces OCR output fine for JPX images as of this version:

{noformat}
  $ tesseract -v
     tesseract 3.04.01
       leptonica-1.73
         libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
{noformat}

However, these types are not declared by getSupportTypes so no output is 
produced for PDFs which contained JPX images of scanned documents, for example.

  was:
Tesseract produces OCR output fine for JPX images as of this version:

{noformat}
  $ tesseract -v
     tesseract 3.04.01
       leptonica-1.73
         libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
{noformat}

However, these types are not declared by getSupportTypes so no output is 
produced for PDFs which contained JPX images of scanned documents, for example.

        Summary: Too few formats in support declared by TesseractOCRParser  
(was: JP2 and JPX (JPEG 2000) support not declared by TesseractOCRParser)

> Too few formats in support declared by TesseractOCRParser
> ---------------------------------------------------------
>
>                 Key: TIKA-2174
>                 URL: https://issues.apache.org/jira/browse/TIKA-2174
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.14
>            Reporter: Matthew Caruana Galizia
>
> A complete install of Leptonica with Tesseract will add support for formats 
> that are not declared by TesseractOCRParser. These include JP2, JPX and PPM.
> Tesseract produces OCR output fine for JPX images as of this version:
> {noformat}
>   $ tesseract -v
>      tesseract 3.04.01
>        leptonica-1.73
>          libjpeg 8d : libpng 1.6.26 : libtiff 4.0.6 : zlib 1.2.5}}
> {noformat}
> However, these types are not declared by getSupportTypes so no output is 
> produced for PDFs which contained JPX images of scanned documents, for 
> example.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to