[jira] [Commented] (TIKA-2444) JP2 codestream files not parsed

Matthew Caruana Galizia (JIRA) Wed, 30 Aug 2017 10:38:38 -0700

    [ 
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147657#comment-16147657
 ]


Matthew Caruana Galizia commented on TIKA-2444:
-----------------------------------------------

I have no idea. I'm trying to solve a similar problem with raw G4 bytestreams 
that are not contained in a TIFF container.

Anyone you know who has experience with image parsing in Java?

> JP2 codestream files not parsed
> -------------------------------
>
>                 Key: TIKA-2444
>                 URL: https://issues.apache.org/jira/browse/TIKA-2444
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.16
>            Reporter: Matthew Caruana Galizia
>              Labels: imageio, images, ocr
>         Attachments: balloon.j2c
>
>
> We've come across some embedded files in the wild that are detected by Tika 
> as {{image/x-jp2-codestream}}. The identification is correct according to a 
> description of the format [1].
> However, no Parser implementation declares support for this format.
> It would makes to declare support for this format in the Tesseract OCR 
> parser. However, the parser would need to contain functionality that either:
> 1) wraps the codestream in a JP2 container;
> 2) or transcodes the image to PNG.
> This is because while Tesseract supports JP2 (via Leptonica), it doesn't 
> support the raw codestream as a file.
> [1] http://fileformats.archiveteam.org/wiki/JPEG_2000_codestream



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (TIKA-2444) JP2 codestream files not parsed

Reply via email to