Executive summary: Potentially interesting food for thoughts for speed
freaks. :-) The conclusion at the end is important.
I'm look at a method in FOP right now: toByteArray(InputStream, int)
where int is the initial buffer size for the
java.io.ByteArrayOutputStream. It is a hint to that class so buffer
reallocations can be minimized.
Ok, with the new ByteArrayOutputStream it wouldn't be so bad without
such a hint because there will be no unnecessary buffer allocations. But
it still has to do more memory allocations than with the hint.
Now consider the following method from IOUtil:
public static byte[] toByteArray( final InputStream input, final int bufferSize )
throws IOException
{
final ByteArrayOutputStream output = new ByteArrayOutputStream();
copy( input, output, bufferSize );
return output.toByteArray();
}
Most methods that do some copying have a similar method where you can
specify the bufferSize for the copying. Tweaking that value is said to
have some effect on performance. I wondered if that has a lot of effect
in this particular case where a lot of memory allocation and memory copy
happens. So I did some tests:
Making a byte[] from a 12.7MB data package in memory (using
ByteArrayInputStream):
method: memory usage: average time:
The above IOUtil.toByteArray() method:
toByteArray(in) 28820kb 120ms
toByteArray(in, 4096) 28820kb 120ms
toByteArray(in, 13MB) 24872kb 100ms
(the second param here means the buffer size given to the IOUtil.copy
method)
A FOP-style toByteArray() method using java.io.ByteArrayOutputStream:
toByteArray(in, 32) 28820kb 120ms
toByteArray(in, 1024) 28820kb 125ms
toByteArray(in, 13MB) 25135kb 60ms
(the second param here means the initial target buffer size or initial
capacity for the ByteArrayOutputStream)
A FOP-style toByteArray() method using the new ByteArrayOutputStream:
toByteArray(in, 32) 28820kb 60ms
toByteArray(in, 1024) 28820kb 60ms
toByteArray(in, 13MB) 25135kb 60ms
(the second param here means the initial target buffer size or initial
capacity for the ByteArrayOutputStream)
Notes:
- The slightly lower memory usage of the method calls that use a 13MB
hint are both a result of memory allocation strategies. The strategy
for the new ByteArrayOutputStream could probably be optimized to lower
memory usage without losing a lot of performance.
- The fact that the overall memory usage is around twice the size of the
data package is due to the fact the
ByteArrayOutputStream.toByteArray() creates a new byte array by
contract. The original data package is not included in this
calculation.
- Doing the same test using a FileInputStream as input doesn't change
the above figures because after warming up the JVM, the OS has the
12.7MB file in the filesystem cache anyway.
Conclusion: I think that the bufferSize parameter makes no sense for
toByteArray(*) methods in IOUtil. Removing them reduces the number of
methods in IOUtil and therefore improving oversight. Changing IOUtil to
use the new ByteArrayOutputStream for toByteArray(*) methods makes sense
once it is a bit more tested. I can send patches.
Opinions?
Jeremias Maerki
FOP committer
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]