[
https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739288#comment-17739288
]
Gary Lucas commented on IMAGING-356:
------------------------------------
I haven't studied the changes that were made, so I can't offer any
authoritative recommendations on the approach. Instead, I have a few general
observations about the way TIFF files work that may be useful in figuring how
you tackle the problem. Or perhaps not. So take them with a grain of salt.
TIFF files are kind of a special case in terms of image formats. First off, one
can never assume that a TIFF file is going to be accessed in-order. It is
common for the the "directory" section of the file (which tells how its
organized) to come last rather than first. And, of course, a TIFF file may have
multiple directories (because it may contain multiple images). Second, TIFF
files are typically quite large, often in the hundreds of megabytes range, and
sometimes in the gigabyte range. So it is often preferred to not keep the
entire thing in memory. In many cases, an application will not access the
entire file, but only a subsection. For example, a mapping program displaying
an aerial photograph might only access the subsection of the photograph that is
actually visible on the map. And finally, I note that TIFF files are often not
images at all, but are used to store numerical raster data (such as Earth
elevation and ocean depth data).
All of this means that the file-access pattern for a TIFF file is a closer fit
to the idea of a random access file rather than the idea of a sequential IO
channel such as a network socket or a serial device. I know that the PNG
format (the only other one I've studied in depth) was designed with network
access specifically in mind. The TIFF format evolved before network access was
in the ascendency as it is today.
That being said, even the original Commons Imaging approach to TIFF file IO
wasn't quite a perfect fit. For one thing, the original authors open and close
a file multiple times (as they access each part of the file) . That is
suboptimal since opening and closing a file carries its own performance
overhead. Also, when I was looking at refactoring Commons Imaging IO to
implement Closeable to support of try-with-resources blocks, I didn't see a way
to accomplish that without a significant rewrite and compatibility breaking
changes to the public API.
> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---------------------------------------------------
>
> Key: IMAGING-356
> URL: https://issues.apache.org/jira/browse/IMAGING-356
> Project: Commons Imaging
> Issue Type: Bug
> Components: Format: TIFF
> Affects Versions: 1.0
> Reporter: Gary Lucas
> Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github
> of June 2023) to read a 300 megabyte TIFF file. Version 1.0-alpha3 required
> 673 milliseconds to read that file. The new code requires upward of 15
> minutes. Clearly something got broken since the last release.
> The TIFF file is a 10000x10000 pixel 4 byte image format organized in strips.
> The bottleneck appears to occur in the TiffReader getTiffRawImageData method
> which reads raw data from the file in preparation of creating a BufferedImage
> object.
> I suspect that there may be a general slowness of file access. In debugging,
> even reading the initial metadata (22 TIFF tags) took a couple of seconds.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)