On 2012-04-18 13:05, Bart van der Schans wrote:
On Wed, Apr 18, 2012 at 12:26 PM, Julian Reschke<[email protected]> wrote:
On 2012-04-18 11:51, Bart van der Schans wrote:
Hi Julian,
On Wed, Apr 18, 2012 at 11:28 AM, Julian Reschke<[email protected]>
wrote:
Hi there.
(posting here instead of opening a ticket because JIRA is currently down)
It appears that people are (ab)using the RepositoryChecker to fix the
versioning information in their repo after *removing* the version
storage.
(It would be good to understand why this happens, but anyway...)
Could it be that people want to cleanup their version history as it
can grow quite large over time? We have had this request several times
from customers. An option could be to provide a more convenient way to
do clean it up properly.
Maybe (but not in this case).
The RepositoryChecker, as currently implemented, walks the repository,
collects changes, and, when done, submits them as a single repository
ChangeLog.
This will not work if the number of affected nodes is big.
Unfortunately, the checker is currently designed to do things to two
steps;
we could of course stop collecting changes after a threshold, then apply
what we have, then re-run the checker. That would probably work, but
would
be slow on huge repositories.
The best alternative I see is to add a checkAndFix() method that is
allowed
to apply ChangeLogs to the repository on the run (and of course to use
that
variant from within RepositoryImpl.doVersionRecovery()).
Feedback appreciated, Julian
We (@Hippo) have been doing quite a bit of work on the consistency
checker lately. See the following issues:
https://issues.apache.org/jira/browse/JCR-3267
https://issues.apache.org/jira/browse/JCR-3265
https://issues.apache.org/jira/browse/JCR-3269
https://issues.apache.org/jira/browse/JCR-3277
https://issues.apache.org/jira/browse/JCR-3263
Saw that (and sorry for not providing feedback yet). But that was about the
*ConsistencyChecker*, not the *RepositoryChecker*, right? (The latter fixes
versioning inconsistencies, so it operates at a higher level).
It's about both ;-)
I hope to have some free cycles soon to go over the issues. Some are
straight fixes and some concern some larger changes which probably
need some input from other developers as well.
To provide some background information: we have had some serious
issues with inconsistencies in the repository with several customers.
We've invested quite some time in tracking down the root cause of
these problems (I will send an email about that shortly) and creating
a standalone checker that can quickly check and fix all current
inconsistencies. We can now check and fix millions of nodes in the
matter of minutes although this comes at a cost of quite some memory
usage. The current checks also didn't find all inconsistencies so we
improved/added some checks.
...
Note that we have a test case for repository fixes (see
AutoFixCorruptNode); it would probably be good to have test coverage for
any functionality you're adding...
Best regards, Julian