I don't in general disagree with this sort of optimization, but I think a good fix is a bit more complicated than what you posted.

Lukas Zapletal wrote:
And here comes the fixes:

OutputStream:

        /**
         * Writes an array of bytes.
* * @param b
         *            the bytes to write
         * @param length
         *            the number of bytes to write
         * @see InputStream#readBytes(byte[],int,int)
         */
public final void writeBytes(byte[] b, int length) throws IOException {
//              for (int i = 0; i < length; i++) writeByte(b[i]);

                if (bufferPosition > 0) // flush buffer
                        flush();

                if (length < BUFFER_SIZE) {
                                flushBuffer(b, length);
                } else {
                        int pos = 0;
                        int size;
                        while (pos < length) {
                                if (length - pos < BUFFER_SIZE) {
                                        size = length - pos;
                                } else {
                                        size = BUFFER_SIZE;
                                }
System.arraycopy(b, pos, buffer, 0, size); pos += size;
                                flushBuffer(buffer, size);
                                bufferStart += size;
                        }
                }
        }

This forces a flush() each time a byte array of any size is written. That could be much slower when lots of small byte arrays are written, since flush() invokes a system call. What would be best is, if there is room in the buffer, to simply use System.arraycopy to append the new data to the buffer, with no flush. If the new data is larger than a buffer, then the buffer should be flushed and the new data written directly, without ever copying it into the buffer. If the new data is smaller than a buffer but larger than the room available in the current buffer, then it should be used to fill the current buffer, that should be flushed, then the remainder should be copied to the buffer. Does that sound right?

InputStream:

        public final void readBytes(byte[] b, int offset, int len)
                        throws IOException {
//              if (len < BUFFER_SIZE) { // not required
//                      for (int i = 0; i < len; i++)
//                              // read byte-by-byte
//                              b[i + offset] = (byte) readByte();
//              } else { // read all-at-once
                        long start = getFilePointer();
                        seekInternal(start);
                        readInternal(b, offset, len);

                        bufferStart = start + len;
bufferPosition = 0; bufferLength = 0;
 //             }
        }

Again, this could be much slower when lots of small arrays are written, since each call forces seek and read system calls. However this could be optimized for the case where the desired data resides entirely in the current buffer to use System.arraycopy.

There is significant time improvement for writing and slight for
reading. I also recommend set the buffer to 8 or 16 kilobytes.

In certain cases Lucene allocates many stream buffers. Making these larger can thus greatly increase the amount of memory used. Also, the filesystem should optimize sequential reads so that the primary improvement seen with a larger buffer size is fewer system calls. In my experience, a buffer of 1k or so is usually large enough so that the system call overheads are minimal.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to