[
https://issues.apache.org/jira/browse/SANSELAN-58?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13259306#comment-13259306
]
Gary Lucas commented on SANSELAN-58:
------------------------------------
Since the patch of 22 April 2012 claims enhanced speed of loading, I thought
that some hard data might be in order. One thing to keep in mind is that TIFF
files are often huge. The 25 megapixel file used for the test described below
is not even considered especially large. Thus any operation that is performed
once-per-pixel is repeated 25 million times. So if we can avoid even a single
if/then operation, the savings adds up.
The Enhancements ------------------------
The TIFF specification provides several formats for the storage of data. This
patch optimizes the software for two special cases that represent the most
widely used TIFF formats: 24-bit RGB and 8-bit grayscale or indexed palette.
The 24-bit RGB enhancement involves 3 changes:
1) The old code implemented a generic getSamplesAsBytes method to extract data
from the raw bit stream. This method was invoked once per pixel. Each time it
was called, it needed to execute logic to see which format it was reading and
which code branch to traverse. By detecting the special case 24-bit RBG format
before enterring the loop, this per-pixel overhead was avoided.
2) Reorganize the access loop with a nested row/column loop, eliminating one
conditional operation per pixel
3) The old code invoked a method called a "photometricInterpreter" to pack
R,G,B values into a single integer for storage into the final product. Replace
this method width byte-manipulation logic built into the loop (i.e. in-line the
logic for the byte manipulation). I tried several different ways of coding the
byte-shifting to find one that was particularly fast.
The Test Procedure ---------------------------
The testing was performed on a 5000-by-5000 pixel 24-bite RGB TIFF image.
The testing procedure loads the image 10 times, and records how long it takes
for each operation. When accumulating an average load time, it throws out the
first two load operations. So even though 10 tests are performed, the average
load time is based on 8 tests. Ignoring the first load operation makes sense
because it is affected by things like class loading and JIT compiling and
always takes longer than those that follow. In other words, it is contaminated
by timing factors other than those we wish to measure. The second load
operation is also ignored because, under Linux, I've observed that the second
load observation sometimes shows evidence timing contamination (though to a
lesser degree than the first). Between load operations, the test routine
explicitly invokes the Java Runtime garbage collection method and then executes
a 1 seconds sleep to give the garbage collection time to complete. The purpose
of the explicit garbage collection operation is to avoid contaminating time
measurements during a load test where the garbage collector might be cleaning
up memory from a previous load operation.
The Results:
Original Implementation: 2261.356 ms.
Remove call to getSamplesAsBytes: 948.304 ms.
Reorganize access loop: 879.196 ms.
In-Line photometric interpreter: 624.951 ms.
Total reduction: 72 percent
> Streamlined TIFF strip reader reduces load time by a factor of 5
> ----------------------------------------------------------------
>
> Key: SANSELAN-58
> URL: https://issues.apache.org/jira/browse/SANSELAN-58
> Project: Commons Sanselan
> Issue Type: Improvement
> Reporter: Gary Lucas
> Attachments: Sanselan-58-TiffStripReaderSpeed.patch,
> Tracker_Item_58_22_Apr_2012.patch
>
> Original Estimate: 1h
> Remaining Estimate: 1h
>
> Testing reveals that streamlining the DataReaderStrip.java operations for 8
> and 24 bit-per-pixel TIFF images reduces the TIFF file load time by a factor
> of 5.
> For each pixel in images of these types, the interpretStrip() method of
> DataReaderStrip makes calls to a generic bit extractor using its
> getSamplesAsBytes() method. Internally, this method simply copies the
> requisite number of bytes (8 or 24), but it executes a lot of conditional
> statements to do so. Under most architectures, conditionals tend to take 2
> to 3 times as long to execute as simple arithmetic statements, so this
> approach is expensive (especially since an image may contain millions of
> pixels). While the implementation is very generic, the majority of TIFF
> files out there appear to fall into two simple categories. By implementing
> specialized code for these two cases, the loading time for TIFF images is
> dramatically reduced.
> The following snippet shows the code I used for testing. It was added right
> at the beginning of the interpretStrip() method.
> // Oct 2011 changes.
> // The general case decoder is based on the idea of using a
> // generic bit-reader to unpack the number of bytes that are
> // needed. Although it is efficiently implemented, it does
> // require performing at least three conditional branches per sample
> // extracted (and often more). This change attempts to bypass that
> // overhead by implementing specialized blocks of extraction code
> // for commonly used 8 bitsPerPixel and 24 bitsPerPixel cases.
> // In other cases, it will simply fall through to the original code.
> // note that when promoting a byte to an integer, it is necessary
> // to mask it with 0xff because the Java byte type is signed
> // an this implementation requires an unsigned value
> if(x>=width)
> {
> // this may not be required. it was coded based on the
> // original implementation. But looking at the TIFF 6.0 spec,
> // it looks like the rows always evenly fill out the strip,
> // so there should never be a partial row in a strip and x
> // should not be anything except zero.
> x = 0;
> y++;
> }
> if(y>=height)
> {
> // we check it once before starting, so that we don't have
> // to check it redundantly for each pixel
> return;
> }
>
> if(predictor==-1 && this.bitsPerPixel==8)
> {
> int [] samples = new int[1];
> for(int i=0; i<pixels_per_strip; i++)
> {
> samples[0] = bytes[i]&0x000000ff;
> photometricInterpreter.interpretPixel(bi, samples, x, y);
> x++;
> if(x>=width)
> {
> x = 0;
> y++;
> if(y>=height)
> return; // any remaining bytes are not needed
> }
> }
> return;
> }
> else if(predictor==-1 && this.bitsPerPixel==24)
> {
> int [] samples = new int[3];
> int k = 0;
> for(int i=0; i<pixels_per_strip; i++)
> {
> samples[0] = bytes[k++]&0x000000ff;
> samples[1] = bytes[k++]&0x000000ff;
> samples[2] = bytes[k++]&0x000000ff;
> photometricInterpreter.interpretPixel(bi, samples, x, y);
> x++;
> if(x>=width)
> {
> x = 0;
> y++;
> if(y>=height)
> return; // any remaining bytes are not needed
> }
> }
> return;
> }
>
> // original code before Oct 2011 modification
> ByteArrayInputStream bais = new ByteArrayInputStream(bytes);
> BitInputStream bis = new BitInputStream(bais);
> etc.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira