Gary Lucas created SANSELAN-76:
----------------------------------

             Summary: Reduce memory use of TIFF readers
                 Key: SANSELAN-76
                 URL: https://issues.apache.org/jira/browse/SANSELAN-76
             Project: Commons Sanselan
          Issue Type: Improvement
          Components: Format: TIFF
            Reporter: Gary Lucas


This Tracker Item proposes changes to the TIFF file readers to address memory 
issues when reading very large images from TIFF files.  The TIFF format is used 
extensively in technical applications such as aerial photographs, satellite 
images, and digital raster maps which feature very large image sizes.  For 
example, the public-domain Natural Earth Data set features raster files sized 
21,600 by 10,800 pixels (222.5 megapixels).   Although this example is 
unusually large, image sizes of 25 to 100 megapixels are common for such 
applications.

Unfortunately, when Sanselan reads a TIFF image, it consumes nearly twice as 
much memory as is necessary.  The reader operates in two stages. First, it 
reads the entire source file into memory then it builds the output image, also 
in memory.   In the example file mentioned above, the source data runs from 
83.19 to 373 megabytes (depending on compression).   Thus Sanselan would 
require a minimum of 83.19+4*222.5 = 985 megabytes to produce an image for one 
of these files (allowing 4 bytes per pixel in the output BufferedImage)

Fortunately, TIFF files are organized so that they can be read a piece at a 
time.  TIFF files are divided into either strips or tiles and, if data 
compression is used, each piece is compressed individually.  Thus each 
individual piece has no dependency on the other. 

This item proposes to implement two changes:

1)  Allow the TIFF data reader to read the files one piece at a time while 
constructing the buffered image.  Thus the memory use for reading would be no 
larger than the piece size.  This would be an internal change, so the external 
appearance of the Sanselan getBufferedImage methods would not change.

2) Provide new API elements that permit applications to read the strips or 
tiles from TIFF files individually.     This change would support applications 
that needed to access very large TIFF files without committing the memory to 
store a BufferedImage for the entire file (a 222.5 megapixel image requires 890 
megabytes, which is a lot even by contemporary standards).

There is one minor issue in this implementation that is easily addressed.  
Sanselan reads images from ByteSources that can be either random-access files 
or sequential-access input streams.  In the case of sequential-input streams, 
it may be hard to perform a partial read on a TIFF directory.  In such a case, 
the TIFF access routines might have to resort to reading the entire source data 
into memory as it currently does.   This would simply be a limitation of the 
implementation.

There is one issue that may make this change a bit problematic.  The TIFF 
processors depend on accessing a class called TiffDataElement that contains a 
public array of bytes called "data".   The most expeditious way of implementing 
the enchancement is to make this element private and add an accessor that 
either returns the data from internal memory or else loads it on-demand.  
Unfortunately, because the data element is scoped to public, there is a chance 
that some existing applications are using it directly.   In hindsight, it is 
clear that scoping this element as public was a mistake, but it may be too late 
to fix it.  So care will be required to ensure that compatibility remains.   
The most likely solution seems to be to implement a new class for passing raw 
data from the source TIFF files to the DataReader implementations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to