Hi Andreas,
Now, a news message [1] on TheServerSide about benchmarks provided by Alfresco to prove the superiority
ermhh.... let's say "state" not "prove" ;)
...of their JCR implementation raises some concerns.
I guess that this may exactly have been the intention ;) Also, the term "JCR implementation" may not be technically accurate, maybe someone could point me to an updated version of this: http://wiki.alfresco.com/w/index.php?title=JSR-170_Compliance
A post in the thread claims that Jackrabbit isn't suited for large-scale scenarios and faces some problems in the transactional handling of some 100.000 nodes (Kev Smith, [2]):
While Kev possibly has reasons to believe that, I don't. (Unless he talks about some 100k nodes a single transaction and a given memory size.)
"From what we've seen, Alfresco is comparable to JackRabbit for small case scenarios - but Alfresco is much more scalable [...]" Do you agree to this statement? If yes - are these problems related to the persistence manager abstraction? Is this a known issue, and will it be addressed?
I do not even remotely agree with this statement. Jackrabbit has been built to scale freely in size. I have a hard time understanding this argument since both Jackrabbit and Alfresco can use the same RDBMS as the persistence layer, so at least on the persistence layer there should not be a substantial difference. Thoughts?
"We tried to load up JackRabbit with millions of nodes but always ran into blocker issues after about 2 million or so objects. Also when loading up JackRabbit, the load needed to be carefully performed in small chunks e.g. trying to load in 100,000 nodes at a time would cause PermGenSpace errors (even with a HUGE permgenspace!) and potentially place the repo into a non-recoverable state." I'm not sure if this will really be an issue for our usage scenario (except maybe from restoring backups), but I'm very interested in your opinions.
That's true, the size of the non-binary portions of a commit are "currently" memory constrained. "Backup/Restore" operations in my experience usually happen on the persistence layer, which means that restore operation (obviously) does not go through the normal user API. I actually would go as far as stating that it would be close to abuse of the API to go through the transient layer to restore an entire content repository. We are currently working on a solution for that, but since nobody had a pressing need, it had a relatively low priority. If this is a pressing issue for your project feel free to file a JIRA issue. regards, david
