Consistency checker performance improvements
--------------------------------------------

                 Key: JCR-3263
                 URL: https://issues.apache.org/jira/browse/JCR-3263
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
            Reporter: Unico Hommes


Currently the consistency checker loads in a batch of node ids and for each 
node id fetches the corresponding bundle, its child bundles, and parent bundle 
separately. This makes the consistency checker perform less than optimal and 
may take hours (days?) to complete for large repositories.

I've been able to make the checker execute about 20 times faster on my local 
machine by loading in batches of node prop bundles at once. For 17000 nodes in 
the workspace the current implementation ran for about 23 seconds whereas with 
the enhancements I made it finished in 1.2 seconds.

Now the problem lies in the fact that loading in node prop bundles in batches 
may require a lot of memory. And it is not very predictable how much per batch 
size because the sizes of the individual bundles are unpredictable.

Also the node prop bundle contains much more information than is needed for a 
consistency check.

What would be ideal in this situation is to introduce a new type - call it 
NodeInfo - that contains only the structural information the checker needs to 
do its work. Meaning the node id, the parent id and the child ids. In order to 
allow for a possible future referential integrity check perhaps also its 
reference type propeties.

The IterablePersistenceManager interface would then get an additional method:
Map<NodeId, NodeInfo> getAllNodeInfos();

If this is an acceptable proposal I would like to work on this and contribute a 
patch.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to