[jira] [Commented] (JCR-3557) Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager

Lukas Eder (JIRA) Thu, 04 Apr 2013 03:15:17 -0700

    [ 
https://issues.apache.org/jira/browse/JCR-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621996#comment-13621996
 ]


Lukas Eder commented on JCR-3557:
---------------------------------

... note, I'm aware of some implementations (including the one from the 
screenshot) using NODE_ID_HI and NODE_ID_LO, which makes using the "IN 
predicate" a bit more tricky. Some databases support tuples, e.g.

    (NODE_ID_HI, NODE_ID_LO) IN ((?, ?), (?, ?), (..., ...), ...)

Others would have to resort to "expanding" the tuples:

    (NODE_ID_HI = ? AND NODE_ID_LO = ?) OR (NODE_ID_HI = ? AND NODE_ID_LO = ?) 
OR (...)

Besides, there are lots of other limits to consider:

- Oracle's IN predicate can only contain 1000 elements
- SQL Server and Sybase have a limit of around 2000 bind variables per query

See also:
http://stackoverflow.com/q/11266609/521799
                
> Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
> -----------------------------------------------------------------------------
>
>                 Key: JCR-3557
>                 URL: https://issues.apache.org/jira/browse/JCR-3557
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-core
>    Affects Versions: 2.6
>            Reporter: Lukas Eder
>            Priority: Minor
>         Attachments: Yourkit_Screenshot_SQL.png, 
> Yourkit_Screenshot_Stacktrace.png
>
>
> Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL 
> queries being executed when using the BundleDbPersistenceManager. Please 
> consider the attached Screenshots from Yourkit Profiler, which illustrate how 
> saving the deletion of a single node through the JCR API results in 34 SQL 
> DELETE statements for my test case:
> - Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion. 
> Please disregard the absolute values of the time column. It is biased by the 
> profiling session.
> - Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by 
> the selected stack subtree
> While other operations are probably not very optimal either, this one is a 
> very low hanging fruit, in my opinion. Consider the following piece of code 
> in AbstractBundlePersistenceManager.storeInternal():
> {code}
> for (ItemState state : changeLog.deletedStates()) {
>     if (state.isNode()) {
>         NodePropBundle bundle = getBundle((NodeId) state.getId());
>         if (bundle == null) {
>             throw new NoSuchItemStateException(state.getId().toString());
>         }
>         deleteBundle(bundle);
>         deleted.add(state.getId());
>     }
> }
> {code}
> Now, instead of iterating over deletedStates and loading / deleting them one 
> by one, they should be bulk-loaded / bulk-deleted as such:
> {code}
> List<NodeId> nodeIds = new ArrayList<NodeId>();
> for (ItemState state : changeLog.deletedStates()) {
>     if (state.isNode()) {
>         nodeIds.add((NodeId) state.getId());
>     }
> }
> List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
> bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException
> deleteBundles(bundles);
> deleted.addAll(nodeIds);
> {code}
> For backwards-compatibility and convenience, AbstractBundlePersistenceManager 
> would provide trivial default implementations for getBundles() and 
> deleteBundles():
> {code}
> protected List<NodePropBundle> getBundles(Collection<? extends NodeId> 
> nodeIds) throws ItemStateException {
>     List<NodePropBundle> result = new ArrayList<NodePropBundle>();
>     for (NodeId nodeId : nodeIds) {
>         result.add(getBundle(nodeId));
>     }
>     return result;
> }
> protected void deleteBundles(Collection<? extends NodePropBundle> bundles) 
> throws ItemStateException {
>     for (NodePropBundle bundle : bundles) {
>         deleteBundle(bundle);
>     }
> }
> {code}
> But the BundleDbPersistenceManager could override this default behaviour by 
> executing something like
> {code}
> -- getBundles()
> select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)
> -- deleteBundles()
> delete from BUNDLE where NODE_ID in (?, ?, ...)
> {code}
> The same also applies for deleting from the REFS table.
> If this seems like a viable improvement and if I didn't overlook something 
> subtle where one-by-one selection / deletion of bundle data is important, I 
> could go ahead and provide a patch for this issue. Once such an improvement 
> is in place, other bundle persistence managers might take advantage of the 
> new algorithm as well. Also, we could review adding / updating bundles in 
> another JIRA ticket.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (JCR-3557) Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager

Reply via email to