[
https://issues.apache.org/jira/browse/TIKA-4209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4209:
------------------------------
Description:
[~johanvanderknijff] recently published a great post on multipage TIFFs:
[https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories]
I hadn't worked on TIFF in a while. I tried out a few sample multipage tiffs
and found that we are not processing anything beyond the first page/image in a
TIFF. Even worse, we're not populating our
"{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs.
It looks like Drew Noakes' metadata-extractor is not yet handling these well:
[https://github.com/drewnoakes/metadata-extractor/issues/648]
There's an example file on that issue:
[https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip]
And [~johanvanderknijff] also pointed out to TIFFs available here:
[https://www.leadtools.com/support/forum/posts/t10960-]
was:
[~johanvanderknijff] recently published a great post on multipage TIFFs:
https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories
I hadn't worked on TIFF in a while. I tried out a few sample multipage tiffs
and found that we are not processing anything beyond the first page/image in a
TIFF. Even worse, we're not populating our
"{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs.
It looks like Drew Noakes' metadata-extractor is not yet handling these well:
[https://github.com/drewnoakes/metadata-extractor/issues/648]
There's an example file there:
[https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip]
> Improve handling of multipage tiffs
> -----------------------------------
>
> Key: TIKA-4209
> URL: https://issues.apache.org/jira/browse/TIKA-4209
> Project: Tika
> Issue Type: New Feature
> Reporter: Tim Allison
> Priority: Major
>
> [~johanvanderknijff] recently published a great post on multipage TIFFs:
> [https://www.bitsgalore.org/2024/03/11/multi-image-tiffs-subfiles-and-image-file-directories]
> I hadn't worked on TIFF in a while. I tried out a few sample multipage tiffs
> and found that we are not processing anything beyond the first page/image in
> a TIFF. Even worse, we're not populating our
> "{color:#000000}imagereader:NumImages{color}" metadata value for TIFFs.
> It looks like Drew Noakes' metadata-extractor is not yet handling these well:
> [https://github.com/drewnoakes/metadata-extractor/issues/648]
>
> There's an example file on that issue:
> [https://github.com/drewnoakes/metadata-extractor/files/14052854/color-pages-jpg.zip]
> And [~johanvanderknijff] also pointed out to TIFFs available here:
> [https://www.leadtools.com/support/forum/posts/t10960-]
--
This message was sent by Atlassian Jira
(v8.20.10#820010)