Now I tried all test cases. They all work except for Russian analyser/stemmer and occational fails of TestIndexReader (the timestamp problem). So I think it should be ok as far as CoumpoundFile is concerned. Off course we still have to find a good solution for the timestamp problem.
However,I stumbled over a problem that I had missed last time. TestCompoundFile only succedds with your index bound tests in CSInputStream.seekInternal. On Thursday I had deleted them after trying your test cases because the other implementations donīt do these tests either. I did not go too deep into your tests, but do you think the bahaviour of throwing an exception if the seek index is out of bound is required? Its not part of the contract of the other implementations of InputStream. Maybe I am missing something here.
Dmitry Serebrennikov schrieb:
Dear Christoph,
Sounds like an excellent enhancement. From a quick look, it appears that you are right and everything should work just fine but use less memory. One question: have you tried the other test cases also or just the TestCompoundFile. There are quite a few conditions that TestCompoundFile does not cover.
At first I thought that the synchronization around readBytes would cause too much performance degradation when a lot of concurrent queries were executing. But after I looked at it some more, I convinced myself that it should be ok. Have you ran any multi-threaded tests / benchmarks? I think it might also be a good idea before making this change.
Christoph, do you think it is possible to just call readInternal on the base stream instead of the readBytes? The main difference is that we would bypass the buffering in the base stream. I think the buffering done by the superclass of the CSInputStream would be quite enough (which is your point to begin with, right)? Perhaps it would be worthwhile to make InputStream.readInternal() public instead of protected?
In CSInputStream.readInternal I call:
synchronized (base) { base.seek(fileOffset + getFilePointer()); base.readBytes(b, offset, len); }
Calling base.seek does nothing more than setting the file pointer (bufferStart + bufferPosition) of base correctly.
base.readBytes(b, offset, len) in this case does not use the buffer of base (at least in most cases). Look into InputStream.readBytes. If len >= BUFFER_SIZE the base buffer is skipped and the buffer b is used directly.
I think synchronized in our case does not much more than synchronizing on the actual file in FSInputStream.readInternal.
Christoph
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]