[
https://issues.apache.org/jira/browse/TIKA-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147657#comment-16147657
]
Matthew Caruana Galizia commented on TIKA-2444:
-----------------------------------------------
I have no idea. I'm trying to solve a similar problem with raw G4 bytestreams
that are not contained in a TIFF container.
Anyone you know who has experience with image parsing in Java?
> JP2 codestream files not parsed
> -------------------------------
>
> Key: TIKA-2444
> URL: https://issues.apache.org/jira/browse/TIKA-2444
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.16
> Reporter: Matthew Caruana Galizia
> Labels: imageio, images, ocr
> Attachments: balloon.j2c
>
>
> We've come across some embedded files in the wild that are detected by Tika
> as {{image/x-jp2-codestream}}. The identification is correct according to a
> description of the format [1].
> However, no Parser implementation declares support for this format.
> It would makes to declare support for this format in the Tesseract OCR
> parser. However, the parser would need to contain functionality that either:
> 1) wraps the codestream in a JP2 container;
> 2) or transcodes the image to PNG.
> This is because while Tesseract supports JP2 (via Leptonica), it doesn't
> support the raw codestream as a file.
> [1] http://fileformats.archiveteam.org/wiki/JPEG_2000_codestream
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)