On 2012-04-18 11:51, Bart van der Schans wrote:
Hi Julian,

On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke<[email protected]>  wrote:
Hi there.

(posting here instead of opening a ticket because JIRA is currently down)

It appears that people are (ab)using the RepositoryChecker to fix the
versioning information in their repo after *removing* the version storage.
(It would be good to understand why this happens, but anyway...)

Could it be that people want to cleanup their version history as it
can grow quite large over time? We have had this request several times
from customers. An option could be to provide a more convenient way to
do clean it up properly.

Maybe (but not in this case).

The RepositoryChecker, as currently implemented, walks the repository,
collects changes, and, when done, submits them as a single repository
ChangeLog.

This will not work if the number of affected nodes is big.

Unfortunately, the checker is currently designed to do things to two steps;
we could of course stop collecting changes after a threshold, then apply
what we have, then re-run the checker. That would probably work, but would
be slow on huge repositories.

The best alternative I see is to add a checkAndFix() method that is allowed
to apply ChangeLogs to the repository on the run (and of course to use that
variant from within RepositoryImpl.doVersionRecovery()).

Feedback appreciated, Julian

We (@Hippo) have been doing quite a bit of work on the consistency
checker lately. See the following issues:

https://issues.apache.org/jira/browse/JCR-3267
https://issues.apache.org/jira/browse/JCR-3265
https://issues.apache.org/jira/browse/JCR-3269
https://issues.apache.org/jira/browse/JCR-3277
https://issues.apache.org/jira/browse/JCR-3263

Saw that (and sorry for not providing feedback yet). But that was about the *ConsistencyChecker*, not the *RepositoryChecker*, right? (The latter fixes versioning inconsistencies, so it operates at a higher level).

It might be interesting to see what kind of options we have to
implement such an approach. We found that building a complete
hierarchy tree in memory and then doing the consistency checks is by
far the fastest way to run a complete check (something like 50x times
faster). But as noted it will require quite some memory for the check
and possible for the fix. In our current tests when can create the in
memory model for about 3 million nodes in 1GB of heap.

Regards,
Bart


Reply via email to