[ 
https://issues.apache.org/jira/browse/IMAGING-271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17218729#comment-17218729
 ] 

Gary Lucas commented on IMAGING-271:
------------------------------------

My plan is that the proposed writer will use components from the existing 
writers (including an instance of the (poorly named) TiffImageWriterLossy 
class, so I think long-term compatibility will be preserved.   I have no plan 
to remove or replace the existing writers at this time.  So this change 
wouldn't be breaking compatibility.

The main reason that I think that streams are not feasible for the new writer 
is that it needs to be able to access the file at random.  For example, the 
first thing that gets written to the file (after the header) is a directory 
which tells the file positions and sizes of the segments (strips or tiles) that 
make up the body of the image.   Most images include multiple segments.  When 
the body of the image is written, each segment (strip or tile) can be processed 
using data compression.  Because the final size of a compressed output can 
vary, there is no way for the application to know the file position and size of 
the image segments ahead of time.  So it can not store an accurate directory 
before writing the image information.  

 I think that this may be why the existing implementation (which does write to 
streams) keeps the whole image data set in memory before it writes the file.  
It has to collect all the information for output before it can determine the 
file-position information and write it to the directory in a serial manner.

So what I plan to do is to initialize the directory with notional file 
positions/lengths, write the directory to the file, then write the body of the 
image keeping track of the true file positions/lengths.  Once the whole image 
is written (and position/size data collected), the class would go back and 
overwrite the notional data in the directory with the correct information. This 
action, randomly overwriting sections of the image, can't be supported by the 
serial-data-access  pattern of a stream.  

A second, similar problem is that we wish to support the ability of a TIFF file 
to include multiple images (that's the issue that was described in 
IMAGING-235).  In a TIFF file, each directory includes the file offset of the 
directory that is to follow it.  But again, we can't know the size of an image 
until we've completed writing it.  And so, we can't know file offset of the 
second directory until we've completed writing the first.  So the proposed 
writer must go back and modify the  information in the first directory to give 
the file position of the second directory.

Right now, I have a hacked up version of the library that includes some test 
programs that perform the operations that would be used by the new 
writer-appender class. So it looks like it is feasible.  The challenge is 
figuring out if I've missed any important features (for example, I can admit 
right now that I'm not sure how to handle EXIF directories).    

 

 

> Proposed new class for memory-efficient TIFF image writing
> ----------------------------------------------------------
>
>                 Key: IMAGING-271
>                 URL: https://issues.apache.org/jira/browse/IMAGING-271
>             Project: Commons Imaging
>          Issue Type: New Feature
>          Components: Format: TIFF
>    Affects Versions: 1.0-alpha3
>            Reporter: Gary Lucas
>            Priority: Major
>         Attachments: TiffImageWriterAppender.java
>
>
> I am proposing to implement a new class for writing TIFF images in a 
> memory-efficient manner. This class will permit the creation of large-scale 
> TIFF images without undue requirements for memory.  It will also support the 
> creation of TIFF files containing multiple images (in TIFF terminology, 
> "directories").
> I am posting this Jira item to request suggestions for the design of the 
> class as well as to identify requirements from potential users.
> The current TIFF image-writer classes operate on an entire BufferedImage and 
> actually make a copy of the image data before writing it to an output stream. 
>  They do not permit the creation of output images "a piece at a time".  So 
> for very large images, they can require considerable use of memory.
> The proposed approach would allow an application to "append" data to a TIFF 
> file making multiple calls to output methods. The TIFF specification calls 
> for files to be organized into "sections" (strips or tiles). This class will 
> permit an application to write data to the TIFF file a section at a time.
>  I have attached a stub Java class to provide an example of the proposed 
> design. This example is intended to promote discussion and help identify 
> relevant features for the initial implementation.
> Some guiding principles and concepts for design:
>  # Simplicity.  The greatest strength of the TIFF file format, its 
> versatility, is also its greatest weakness. The need to support such a large 
> variety of data formats and operations leads to a complicated API that is 
> sometimes difficult to use.  The proposed design limits some of the output 
> functionality in order to maintain a simpler API. 
>  # Essential features only.  Currently, I have limited resources to devote to 
> this implementation. My intention is to implement only those functions that 
> make the class viable. So the challenge here will be determining what those 
> features are. Comments are welcome. On the other hand, I am striving for a 
> design that will facilitate future development to add other features as they 
> are identified by the user community. 
>  # Operates on files, and only files.  The current Commons Imaging API 
> supports output to various kinds of Java OutputStreams, including memory and 
> socket streams. This class is purposely designed to write to random-access 
> files. This consideration is particularly important to support cases where 
> the ultimate size of the output content cannot be determined a priori.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to