Sounds like an excellent enhancement. From a quick look, it appears that you are right and everything should work just fine but use less memory. One question: have you tried the other test cases also or just the TestCompoundFile. There are quite a few conditions that TestCompoundFile does not cover.
At first I thought that the synchronization around readBytes would cause too much performance degradation when a lot of concurrent queries were executing. But after I looked at it some more, I convinced myself that it should be ok. Have you ran any multi-threaded tests / benchmarks? I think it might also be a good idea before making this change.
Christoph, do you think it is possible to just call readInternal on the base stream instead of the readBytes? The main difference is that we would bypass the buffering in the base stream. I think the buffering done by the superclass of the CSInputStream would be quite enough (which is your point to begin with, right)? Perhaps it would be worthwhile to make InputStream.readInternal() public instead of protected?
Thanks. Dmitry.
Christoph Goller wrote:
Dear Dmitry,
I finally found time to look into your compound file implementation and
to try it out. It works and I like it. However, I wondered, why you are
using clones of the base input stream in CompoundFileReader.CSInputStream.
The "problem" (actually it is not a real problem) I see is that you are
double buffering your input. Note that the clone has a buffer of 1024 bytes and
that CompoundFileReader.CSInputStream, which extends InputStream, also has
such a buffer. I think in the way you implemented it your input gets copied
into the base buffer and from there into the CompoundFileReader.CSInputStream
buffer before it is actually used. Please have a look at my patch. At first
one might think that my implementation is simpler but less efficient. This
is not the case. Actually it is even a little bit more efficient. Points to
think about:
*) stream is private to CompoundFileReader, so nobody else can use it, if not
explicitly granted access. CompoundFileReader uses stream only in its
constructor and when calling openFile.
*) Therefore, CompoundFileReader.CSInputStream can share the same instance of
stream if they take care of synchronization and seek.
*) An InputStream not necessarily has to implement seekInternal if readInternal
takes care of seeking on the real stream (see FSInputStream).
*) Synchronization of CompoundFileReader.CSInputStream.readInternal on base
does no harm since it is called only when the real file has to be accessed
and thi is synchronized anyway. However, this synchronizytion is necessary!
I tested my patch with your TestCompoundFile and everything seems fine.
I also tried it on one of my indices and searched on this index. It also
works. My impression from my tests is that my patch is a little bit faster
than your original version but there seems to be not much difference in
efficiency.
Christoph
------------------------------------------------------------------------
Index: CompoundFileReader.java =================================================================== RCS file: /home/cvs/jakarta-lucene/src/java/org/apache/lucene/index/CompoundFileReader.java,v retrieving revision 1.2 diff -u -r1.2 CompoundFileReader.java --- CompoundFileReader.java 13 Oct 2003 14:18:04 -0000 1.2 +++ CompoundFileReader.java 16 Oct 2003 18:58:02 -0000 @@ -240,10 +240,9 @@ final long length) throws IOException { - this.base = (InputStream) base.clone(); + this.base = base; this.fileOffset = fileOffset; this.length = length; // variable in the superclass - seekInternal(0); // position to the adjusted 0th byte }
/** Expert: implements buffer refill. Reads bytes from the current @@ -255,7 +254,10 @@ protected void readInternal(byte[] b, int offset, int len) throws IOException { - base.readBytes(b, offset, len); + synchronized (base) { + base.seek(fileOffset + getFilePointer()); + base.readBytes(b, offset, len); + } }
/** Expert: implements seek. Sets current position in this file, where @@ -263,35 +265,11 @@ * @see #readInternal(byte[],int,int) */ protected void seekInternal(long pos) throws IOException - { - if (pos > 0 && pos >= length) - throw new IOException("Seek past the end of file"); - - if (pos < 0) - throw new IOException("Seek to a negative offset"); - - base.seek(fileOffset + pos); - } + {}
/** Closes the stream to futher operations. */ public void close() throws IOException - { - base.close(); - } + {}
- /** Returns a clone of this stream. - * - * <p>Clones of a stream access the same data, and are positioned at the same - * point as the stream they were cloned from. - * - * <p>Expert: Subclasses must ensure that clones may be positioned at - * different points in the input from each other and from the stream they - * were cloned from. - */ - public Object clone() { - CSInputStream other = (CSInputStream) super.clone(); - other.base = (InputStream) base.clone(); - return other; - } } }
------------------------------------------------------------------------
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]