Executive summary: Potentially interesting food for thoughts for speed
freaks. :-) The conclusion at the end is important.

I'm look at a method in FOP right now: toByteArray(InputStream, int)
where int is the initial buffer size for the
java.io.ByteArrayOutputStream. It is a hint to that class so buffer
reallocations can be minimized.

Ok, with the new ByteArrayOutputStream it wouldn't be so bad without
such a hint because there will be no unnecessary buffer allocations. But
it still has to do more memory allocations than with the hint.

Now consider the following method from IOUtil:

    public static byte[] toByteArray( final InputStream input, final int bufferSize )
        throws IOException
    {
        final ByteArrayOutputStream output = new ByteArrayOutputStream();
        copy( input, output, bufferSize );
        return output.toByteArray();
    }

Most methods that do some copying have a similar method where you can
specify the bufferSize for the copying. Tweaking that value is said to
have some effect on performance. I wondered if that has a lot of effect
in this particular case where a lot of memory allocation and memory copy
happens. So I did some tests:

Making a byte[] from a 12.7MB data package in memory (using
ByteArrayInputStream):

method:                   memory usage:  average time:

The above IOUtil.toByteArray() method:

toByteArray(in)           28820kb        120ms
toByteArray(in, 4096)     28820kb        120ms
toByteArray(in, 13MB)     24872kb        100ms

(the second param here means the buffer size given to the IOUtil.copy
method)

A FOP-style toByteArray() method using java.io.ByteArrayOutputStream:
toByteArray(in, 32)       28820kb        120ms
toByteArray(in, 1024)     28820kb        125ms
toByteArray(in, 13MB)     25135kb         60ms

(the second param here means the initial target buffer size or initial
capacity for the ByteArrayOutputStream)

A FOP-style toByteArray() method using the new ByteArrayOutputStream:
toByteArray(in, 32)       28820kb         60ms
toByteArray(in, 1024)     28820kb         60ms
toByteArray(in, 13MB)     25135kb         60ms

(the second param here means the initial target buffer size or initial
capacity for the ByteArrayOutputStream)

Notes:
- The slightly lower memory usage of the method calls that use a 13MB
  hint are both a result of memory allocation strategies. The strategy
  for the new ByteArrayOutputStream could probably be optimized to lower
  memory usage without losing a lot of performance.
- The fact that the overall memory usage is around twice the size of the
  data package is due to the fact the 
  ByteArrayOutputStream.toByteArray() creates a new byte array by
  contract. The original data package is not included in this
  calculation.
- Doing the same test using a FileInputStream as input doesn't change
  the above figures because after warming up the JVM, the OS has the
  12.7MB file in the filesystem cache anyway.



Conclusion: I think that the bufferSize parameter makes no sense for
toByteArray(*) methods in IOUtil. Removing them reduces the number of
methods in IOUtil and therefore improving oversight. Changing IOUtil to
use the new ByteArrayOutputStream for toByteArray(*) methods makes sense
once it is a bit more tested. I can send patches.

Opinions?

Jeremias Maerki
FOP committer

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to