Lukas Eder created JCR-3557:
-------------------------------

             Summary: Inefficient deletion of ChangeLog objects by 
AbstractBundlePersistenceManager
                 Key: JCR-3557
                 URL: https://issues.apache.org/jira/browse/JCR-3557
             Project: Jackrabbit Content Repository
          Issue Type: Bug
          Components: jackrabbit-core
    Affects Versions: 2.6
            Reporter: Lukas Eder
            Priority: Minor
         Attachments: Yourkit_Screenshot_SQL.png, 
Yourkit_Screenshot_Stacktrace.png

Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL 
queries being executed when using the BundleDbPersistenceManager. Please 
consider the attached Screenshots from Yourkit Profiler, which illustrate how 
saving the deletion of a single node through the JCR API results in 34 SQL 
DELETE statements for my test case:

- Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. 
Please disregard the absolute values of the time column. It is biased by the 
profiling session.
- Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by the 
selected stack subtree

While other operations are probably not very optimal either, this one is a very 
low hanging fruit, in my opinion. Consider the following piece of code in 
AbstractBundlePersistenceManager.storeInternal():

{code}
for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        NodePropBundle bundle = getBundle((NodeId) state.getId());
        if (bundle == null) {
            throw new NoSuchItemStateException(state.getId().toString());
        }
        deleteBundle(bundle);
        deleted.add(state.getId());
    }
}
{code}

Now, instead of iterating over deletedStates and loading / deleting them one by 
one, they should be bulk-loaded / bulk-deleted as such:

{code}
List<NodeId> nodeIds = new ArrayList<NodeId>();
for (ItemState state : changeLog.deletedStates()) {
    if (state.isNode()) {
        nodeIds.add((NodeId) state.getId());
    }
}
List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException
deleteBundles(bundles);
deleted.addAll(nodeIds);
{code}

For backwards-compatibility and convenience, AbstractBundlePersistenceManager 
would provide trivial default implementations for getBundles() and 
deleteBundles():

{code}
protected List<NodePropBundle> getBundles(Collection<? extends NodeId> nodeIds) 
throws ItemStateException {
    List<NodePropBundle> result = new ArrayList<NodePropBundle>();

    for (NodeId nodeId : nodeIds) {
        result.add(getBundle(nodeId));
    }

    return result;
}

protected void deleteBundles(Collection<? extends NodePropBundle> bundles) 
throws ItemStateException {
    for (NodePropBundle bundle : bundles) {
        deleteBundle(bundle);
    }
}
{code}

But the BundleDbPersistenceManager could override this default behaviour by 
executing something like

{code}
-- getBundles()
select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)

-- deleteBundles()
delete from BUNDLE where NODE_ID in (?, ?, ...)
{code}

The same also applies for deleting from the REFS table.

If this seems like a viable improvement and if I didn't overlook something 
subtle where one-by-one selection / deletion of bundle data is important, I 
could go ahead and provide a patch for this issue. Once such an improvement is 
in place, other bundle persistence managers might take advantage of the new 
algorithm as well. Also, we could review adding / updating bundles in another 
JIRA ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to