Hi Julian, On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke <[email protected]> wrote: > Hi there. > > (posting here instead of opening a ticket because JIRA is currently down) > > It appears that people are (ab)using the RepositoryChecker to fix the > versioning information in their repo after *removing* the version storage. > (It would be good to understand why this happens, but anyway...)
Could it be that people want to cleanup their version history as it can grow quite large over time? We have had this request several times from customers. An option could be to provide a more convenient way to do clean it up properly. > The RepositoryChecker, as currently implemented, walks the repository, > collects changes, and, when done, submits them as a single repository > ChangeLog. > > This will not work if the number of affected nodes is big. > > Unfortunately, the checker is currently designed to do things to two steps; > we could of course stop collecting changes after a threshold, then apply > what we have, then re-run the checker. That would probably work, but would > be slow on huge repositories. > > The best alternative I see is to add a checkAndFix() method that is allowed > to apply ChangeLogs to the repository on the run (and of course to use that > variant from within RepositoryImpl.doVersionRecovery()). > > Feedback appreciated, Julian We (@Hippo) have been doing quite a bit of work on the consistency checker lately. See the following issues: https://issues.apache.org/jira/browse/JCR-3267 https://issues.apache.org/jira/browse/JCR-3265 https://issues.apache.org/jira/browse/JCR-3269 https://issues.apache.org/jira/browse/JCR-3277 https://issues.apache.org/jira/browse/JCR-3263 It might be interesting to see what kind of options we have to implement such an approach. We found that building a complete hierarchy tree in memory and then doing the consistency checks is by far the fastest way to run a complete check (something like 50x times faster). But as noted it will require quite some memory for the check and possible for the fix. In our current tests when can create the in memory model for about 3 million nodes in 1GB of heap. Regards, Bart
