Gary Lucas created IMAGING-257:
----------------------------------

             Summary: Investigate speed improvements to LZW decompression
                 Key: IMAGING-257
                 URL: https://issues.apache.org/jira/browse/IMAGING-257
             Project: Commons Imaging
          Issue Type: Improvement
          Components: Format: TIFF
            Reporter: Gary Lucas


In accessing large TIFF files (10812-by-10812 pixels), read times were about 11 
seconds (with a solid-state disk drive), and I was looking for ways to reduce 
that.  I ran the Netbeans profiler and discovered that 87% of the read time was 
spent in the MyLzwDecompressor decompress() method. 

Inspecting MyLzwDecompressor, I saw that it used the Java 
ByteArrayOutputStream, which is kind of famous for being slow.  You can find 
lots of examples of classes named FastByteArrayOutputStream on the web, 
including one right here in the Commons Imaging project.  

I tried a number of different experiments using the 
ApacheImagingSpeedAndMemoryTest class (from the examples directory).

Replacing ByteArrayOutputStream with FastByteArrayOutputStream produced a 4 
percent reduction in run time.

I then tried using a local array instead of a "byte array" class.  That 
improved things to about a 8 percent reduction in time.  Finally, I tried a few 
more aggressive changes, removing the number of conditional tests and replacing 
calls such as stringFromCode() which wrappers the class member "table" with 
direct access.  Final result was a 11 percent total reduction in time.

11 percent isn't all the impressive, but I haven't been able to find anything 
else.  Modern compilers are so smart and do such a good job optimizing code, 
that it's hard to find "easy wins."  

Anyway, this is a potential area for improvement in the Commons Imaging API.  
Care will be required because there are some features that my test bypassed. 
For example, there's a diagnostic "listener" in the current implementation that 
would have to be supported.  Also, I took out a lot of bounds checking, and 
just assumed that the input compressed data would produce correct output.  In 
real life, that's not a safe assumption.  I would probably try wrapping the 
logic of the decompress method in a try{}catch{} block looking for 
ArrayIndexOutOfBounds and have the method re-throw it as an IOException (which 
is what it does now).  It will also be challenging to find a way of properly 
testing modifications to this class.

I will be looking at this, but probably will not move on it until I get 
feedback from the community.   I don't view this change as unduly risky 
provided that proper care is taken in making the modifications.  But the gain 
in performance is small enough, that I'm not sure it's worth it.

I will also take a look at Commons Compression to see what they do.

If you have any thoughts on this matter, please let me know.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to