Hi, > > Just curious, did you run some tests, is it faster than using > > RandomAccessFile.read? > > I'm planning to... It would be really nice to map the entire journal > file (or the part that already exists when a repository is started) > into memory for direct access. In theory this should reduce the amount > of copying taking place, but it's still unclear whether that happens > in practice.
I guess there will be problems with virtual memory. I don't have much experience with memory mapped files in Java, but from what I have heard there are some problems. For example HSQLDB doesn't use memory mapped files for large databases because of such problems. In any case, I wouldn't architect Jackrabbit around memory mapped files. Your patch exposes java.nio.ByteBuffer: Record.getBuffer(); I'm not sure why this is required. > > Journal: > > "A record is never modified or removed once it has been added to a > > journal.". I agree, if there is a mechanism to removing old journal > > files. > > The way I see it, during normal operation we only need to be able to > append data to the file, so this shouldn't be a problem. At some > points the journal file needs to be vacuumed to save space, but that > operation would typically produce a new file that's then used to > replace the old journal. I'm not yet sure how often and at which > points such vacuuming needs to take place. I'm a bit worried that for long running processes (and Jackrabbit usually runs on the server) this could be a problem. > Also, I have an experimental C version of the journal and tree code. > The idea behind that is possibly to implement a mod_dav module to > enable direct WebDAV access to the tree structure stored in the > journal. I don't understand why you would write a WebDAV server in C. What advantages does this have? I see there are systems where Java is not supported. However now that Java is the most common programming language, and that Java has become more open source, I don't understand why. > The C implementation could also eventually become a highly > optimized JNI backend for an NGP implementation, JNI has a big overhead, both performance wise and development wise. Native implementations have portability, security, and other problems (for example memory leak). There are not many cases where I would JNI. The standard libraries don't use JNI a lot for pure processing (meaning, things that can't be done in Java). Where they use it, the native implementation is not pure C, it is C with some assembler: - Compression (Zip and GZIP) - Checksum (Adler32, CRC32) - Image decoding - System.arraycopy >From what I can see, the rest is in Java (for example BigInteger, BigDecimal, character encoding / decoding, XML processing, Java compiler, String). Not sure about encryption. I have heard many cases where the Java implementation was faster than a C++ version, some before JDK 1.5. There are pure Java compressions tools and MP3 decoders, and they are not much slower than native ones - but development and maintenance is much easier. So I don't buy the 'performance' argument - unless you can show I'm wrong of course ;-) > much more control over memory mapping Sorry, what else do you need except unmapping files? To unmap a file, there is actually a workaround described at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038 (directly call the cleaner object), however I couldn't make it work so far. Regards, Thomas
