[jira] [Comment Edited] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT

Gary Lucas (Jira) Fri, 30 Jun 2023 09:58:06 -0700


    [ 
https://issues.apache.org/jira/browse/IMAGING-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17739170#comment-17739170
 ]


Gary Lucas edited comment on IMAGING-356 at 6/30/23 4:57 PM:
-------------------------------------------------------------

Looking at the code history in github, I may have found one issue, though I am 
not sure that it is the major issue.   In the ByteSource class, there is a call 
to a function called size with the following logic

 
{code:java}
   public long size() throws IOException {
        return origin.getByteArray().length;
    }
{code}
 
So the question comes up, how much work is involved in a call to getByteArray?

It looks like "origin" is an object of type AbstractOrigin.FileOrigin() and the 
call it makes to getByteArray is
{code:java}
        @Override               
        public byte[] getByteArray() throws IOException {
              return Files.readAllBytes(getPath());
        }{code}

Which, of course, is a pretty expensive call. On the other hand, the size() 
method is only called 12 times when loading the PICT2883.TIF image.   But each 
call does pull back 14788608 bytes.

I think that this may have occurred when commons.imaging was refactored to use 
commons.io. 
 It may be that there's an impedance mismatch between the ideas from commons.io 
and the assumptions in the commons.imaging classes.


was (Author: gwlucas):
Looking at the code history in github, I may have found one issue, though I am 
not sure that it is the major issue.   In the ByteSource class, there is a call 
to a function called size with the following logic

 
{code:java}
   public long size() throws IOException {
        return origin.getByteArray().length;
    }
{code}
 
So the question comes up, how much work is involved in a call to getByteArray?

It looks like "origin" is an object of type AbstractOrigin.FileOrigin() and the 
call it makes to getByteArray is
{code:java}
        @Override               
      public byte[] getByteArray() throws IOException {
        return Files.readAllBytes(getPath());
        }{code}

Which, of course, is a pretty expensive call. On the other hand, the size() 
method is only called 12 times when loading the PICT2883.TIF image.   But each 
call does pull back 14788608 bytes.

I think that this my be a case where there's an impedance mismatch between the 
ideas from commons.io and the assumptions in the commons.imaging classes.

> TIFF reading extremely slow in version 1.0-SNAPSHOT
> ---------------------------------------------------
>
>                 Key: IMAGING-356
>                 URL: https://issues.apache.org/jira/browse/IMAGING-356
>             Project: Commons Imaging
>          Issue Type: Bug
>          Components: Format: TIFF
>    Affects Versions: 1.0
>            Reporter: Gary Lucas
>            Priority: Major
>
> I am using the latest code from github (1.0-SNAPSHOT downloaded from github 
> of June 2023) to read a 300 megabyte TIFF file.  Version 1.0-alpha3 required 
> 673 milliseconds to read that file.  The new code requires upward of 15 
> minutes.   Clearly something got broken since the last release.
> The TIFF file is a 10000x10000 pixel 4 byte image format organized in strips. 
>  The bottleneck appears to occur in the TiffReader getTiffRawImageData method 
> which reads raw data from the file in preparation of creating a BufferedImage 
> object.
> I suspect that there may be a general slowness of file access.  In debugging, 
> even reading the initial metadata (22 TIFF tags) took a couple of seconds.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (IMAGING-356) TIFF reading extremely slow in version 1.0-SNAPSHOT

Reply via email to