Re: NGP: Prototyping

Thomas Mueller Fri, 23 Nov 2007 00:23:31 -0800

Hi,

> > Just curious, did you run some tests, is it faster than using
> > RandomAccessFile.read?
>
> I'm planning to... It would be really nice to map the entire journal
> file (or the part that already exists when a repository is started)
> into memory for direct access. In theory this should reduce the amount
> of copying taking place, but it's still unclear whether that happens
> in practice.


I guess there will be problems with virtual memory. I don't have much
experience with memory mapped files in Java, but from what I have
heard there are some problems. For example HSQLDB doesn't use memory
mapped files for large databases because of such problems.

In any case, I wouldn't architect Jackrabbit around memory mapped
files. Your patch exposes java.nio.ByteBuffer: Record.getBuffer(); I'm
not sure why this is required.

> > Journal:
> > "A record is never modified or removed once it has been added to a
> > journal.". I agree, if there is a mechanism to removing old journal
> > files.
>
> The way I see it, during normal operation we only need to be able to
> append data to the file, so this shouldn't be a problem. At some
> points the journal file needs to be vacuumed to save space, but that
> operation would typically produce a new file that's then used to
> replace the old journal. I'm not yet sure how often and at which
> points such vacuuming needs to take place.

I'm a bit worried that for long running processes (and Jackrabbit
usually runs on the server) this could be a problem.

> Also, I have an experimental C version of the journal and tree code.
> The idea behind that is possibly to implement a mod_dav module to
> enable direct WebDAV access to the tree structure stored in the
> journal.

I don't understand why you would write a WebDAV server in C. What
advantages does this have?

I see there are systems where Java is not supported. However now that
Java is the most common programming language, and that Java has become
more open source, I don't understand why.

> The C implementation could also eventually become a highly
> optimized JNI backend for an NGP implementation,

JNI has a big overhead, both performance wise and development wise.
Native implementations have portability, security, and other problems
(for example memory leak). There are not many cases where I would JNI.
The standard libraries don't use JNI a lot for pure processing
(meaning, things that can't be done in Java). Where they use it, the
native implementation is not pure C, it is C with some assembler:

- Compression (Zip and GZIP)
- Checksum (Adler32, CRC32)
- Image decoding
- System.arraycopy

>From what I can see, the rest is in Java (for example BigInteger,
BigDecimal, character encoding / decoding, XML processing, Java
compiler, String). Not sure about encryption. I have heard many cases
where the Java implementation was faster than a C++ version, some
before JDK 1.5. There are pure Java compressions tools and MP3
decoders, and they are not much slower than native ones - but
development and maintenance is much easier.

So I don't buy the 'performance' argument - unless you can show I'm
wrong of course ;-)

> much more control over memory mapping

Sorry, what else do you need except unmapping files? To unmap a file,
there is actually a workaround described at
http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4724038 (directly
call the cleaner object), however I couldn't make it work so far.

Regards,
Thomas

Re: NGP: Prototyping

Reply via email to