I don't in general disagree with this sort of optimization, but I think
a good fix is a bit more complicated than what you posted.
Lukas Zapletal wrote:
And here comes the fixes:
OutputStream:
/**
* Writes an array of bytes.
*
* @param b
* the bytes to write
* @param length
* the number of bytes to write
* @see InputStream#readBytes(byte[],int,int)
*/
public final void writeBytes(byte[] b, int length)
throws IOException {
// for (int i = 0; i < length; i++) writeByte(b[i]);
if (bufferPosition > 0) // flush buffer
flush();
if (length < BUFFER_SIZE) {
flushBuffer(b, length);
} else {
int pos = 0;
int size;
while (pos < length) {
if (length - pos < BUFFER_SIZE) {
size = length - pos;
} else {
size = BUFFER_SIZE;
}
System.arraycopy(b, pos,
buffer, 0, size); pos += size;
flushBuffer(buffer, size);
bufferStart += size;
}
}
}
This forces a flush() each time a byte array of any size is written.
That could be much slower when lots of small byte arrays are written,
since flush() invokes a system call. What would be best is, if there is
room in the buffer, to simply use System.arraycopy to append the new
data to the buffer, with no flush. If the new data is larger than a
buffer, then the buffer should be flushed and the new data written
directly, without ever copying it into the buffer. If the new data is
smaller than a buffer but larger than the room available in the current
buffer, then it should be used to fill the current buffer, that should
be flushed, then the remainder should be copied to the buffer. Does
that sound right?
InputStream:
public final void readBytes(byte[] b, int offset, int len)
throws IOException {
// if (len < BUFFER_SIZE) { // not required
// for (int i = 0; i < len; i++)
// // read byte-by-byte
// b[i + offset] = (byte) readByte();
// } else { // read all-at-once
long start = getFilePointer();
seekInternal(start);
readInternal(b, offset, len);
bufferStart = start + len;
bufferPosition = 0;
bufferLength = 0;
// }
}
Again, this could be much slower when lots of small arrays are written,
since each call forces seek and read system calls. However this could
be optimized for the case where the desired data resides entirely in the
current buffer to use System.arraycopy.
There is significant time improvement for writing and slight for
reading. I also recommend set the buffer to 8 or 16 kilobytes.
In certain cases Lucene allocates many stream buffers. Making these
larger can thus greatly increase the amount of memory used. Also, the
filesystem should optimize sequential reads so that the primary
improvement seen with a larger buffer size is fewer system calls. In my
experience, a buffer of 1k or so is usually large enough so that the
system call overheads are minimal.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]