Hi, >>>But I don't think we should try to increase concurrency of write >>> operations within the *same* repository because that's not a problem at >>> all. >> >> i beg to differ ;) >> >> in jr2 saves are serialized. IMO that's a *real* problem, especially >>when >> saving large change sets. this problem can be addressed e.g. with an >> MVCC based model.
The problem with Jackrabbit isn't that concurrency for write operations is bad: throughput is bad. This is the main problem. Increasing concurrency in the save operation will not affect throughput in a meaningful way (well, most likely it will decrease throughput). I'm not aware that there is a big problem with large change sets. Anyway large change sets should be split up into smaller set. For me, increasing throughput is a lot more important than increasing concurrency. >Yes, I agree. It's something I've seen many times in the field >(consider saving a large pdf in a cms). Large PDFs are stored in the data store. Large binaries are stored there well before the save operation, so this is not part of the save operation at all. Increasing concurrency in the save operation doesn't affect that in any way. >you can't scale out the writes in a >cluster since all writes are serialized for the whole cluster. Yes, this is a big problem. We need to solve it. One idea is to synchronize cluster nodes asynchronously, and better support splitting data into multiple repositories (sharding), for example using virtual repositories that can be linked together. Regards, Thomas
