Hi, On 11/24/06, Marcin Nowak <[EMAIL PROTECTED]> wrote:
Recently I've performed some common tests of JackRabbit performance. Results of these tests can be seen in attached files. I would like to hear your comment - and you answer on point 4.4 Suggestions and wishes.
Thanks a lot for sharing the results with us! This is very interesting data. Some quick comments:
Most important think to us is to improve performance of importing XML documents into repository and to reduce the overhead in RAM to lets say 20x. Moreover we are interested in importing 20[MB] XML documents in 10 minutes with memory usage not exceeding 400[MB] of RAM and CPU usage allowing other users normal work with repository.
The limiting factor with importing large XML documents is that the entire set of changes is built within the transient space of a session before being persisted. And since a Jackrabbit NodeState is even bigger than your average DOM element node, your memory usage will grow rapidly. The preferred alternative I've been using for importing large XML documents is to create a custom importer class that calls Session.save() every now and then to persist the pending changes. You need to be careful to avoid inconsistencies like broken references with this approach, but otherwise it works fine.
The thing which is also very important to us is server CPU usage when deleting repository content – currently this operation blocks all other actions on server by loading it in 100%, it disallows other users from any actions performed on repository.
Large deletes are similarly expensive in that all the deleted states need to be loaded into the transient space before Session.save() gets called. Deletes also fire a number of internal consistency checks for example to enforce that no broken references are left around. Here as well I'd recommend trying to break large deletes to a sequence of smaller operations.
It is also crucial to us to be able to export repository of size 100[MB] to XML file and to restore whole repository from that file later.
Unless you are able to use a custom importer, I would rather recommend backing up and restoring the entire repository directory instead. This approach requires careful synchronization, optimally a repository shutdown during backup and restore, but avoids the performance issues of exporting and importing large XML documents. There are some improvements we could make to make importing large XML documents better, but none of them are too easy to implement especially since we need to maintain spec compliance. There is an ongoing effort to refactor parts of the XML importer mechanism, and one outcome of that effort could be to make it easier to write custom importers for large XML documents.
We would also like to suggest improvement of network usage when communicating with server.
The JCR-RMI layer is inherently quite verbose since it maps almost all JCR API calls to serialized RMI method invocations. There are some tricks we could do to speed things up and avoid too much network usage, but the ongoing SPI effort seems to offer a much better alternative for remote repository access so JCR-RMI is currently not being actively improved. BR, Jukka Zitting
