Gary Lucas created IMAGING-257:
----------------------------------
Summary: Investigate speed improvements to LZW decompression
Key: IMAGING-257
URL: https://issues.apache.org/jira/browse/IMAGING-257
Project: Commons Imaging
Issue Type: Improvement
Components: Format: TIFF
Reporter: Gary Lucas
In accessing large TIFF files (10812-by-10812 pixels), read times were about 11
seconds (with a solid-state disk drive), and I was looking for ways to reduce
that. I ran the Netbeans profiler and discovered that 87% of the read time was
spent in the MyLzwDecompressor decompress() method.
Inspecting MyLzwDecompressor, I saw that it used the Java
ByteArrayOutputStream, which is kind of famous for being slow. You can find
lots of examples of classes named FastByteArrayOutputStream on the web,
including one right here in the Commons Imaging project.
I tried a number of different experiments using the
ApacheImagingSpeedAndMemoryTest class (from the examples directory).
Replacing ByteArrayOutputStream with FastByteArrayOutputStream produced a 4
percent reduction in run time.
I then tried using a local array instead of a "byte array" class. That
improved things to about a 8 percent reduction in time. Finally, I tried a few
more aggressive changes, removing the number of conditional tests and replacing
calls such as stringFromCode() which wrappers the class member "table" with
direct access. Final result was a 11 percent total reduction in time.
11 percent isn't all the impressive, but I haven't been able to find anything
else. Modern compilers are so smart and do such a good job optimizing code,
that it's hard to find "easy wins."
Anyway, this is a potential area for improvement in the Commons Imaging API.
Care will be required because there are some features that my test bypassed.
For example, there's a diagnostic "listener" in the current implementation that
would have to be supported. Also, I took out a lot of bounds checking, and
just assumed that the input compressed data would produce correct output. In
real life, that's not a safe assumption. I would probably try wrapping the
logic of the decompress method in a try{}catch{} block looking for
ArrayIndexOutOfBounds and have the method re-throw it as an IOException (which
is what it does now). It will also be challenging to find a way of properly
testing modifications to this class.
I will be looking at this, but probably will not move on it until I get
feedback from the community. I don't view this change as unduly risky
provided that proper care is taken in making the modifications. But the gain
in performance is small enough, that I'm not sure it's worth it.
I will also take a look at Commons Compression to see what they do.
If you have any thoughts on this matter, please let me know.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)