[
https://issues.apache.org/jira/browse/JCR-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13621996#comment-13621996
]
Lukas Eder commented on JCR-3557:
---------------------------------
... note, I'm aware of some implementations (including the one from the
screenshot) using NODE_ID_HI and NODE_ID_LO, which makes using the "IN
predicate" a bit more tricky. Some databases support tuples, e.g.
(NODE_ID_HI, NODE_ID_LO) IN ((?, ?), (?, ?), (..., ...), ...)
Others would have to resort to "expanding" the tuples:
(NODE_ID_HI = ? AND NODE_ID_LO = ?) OR (NODE_ID_HI = ? AND NODE_ID_LO = ?)
OR (...)
Besides, there are lots of other limits to consider:
- Oracle's IN predicate can only contain 1000 elements
- SQL Server and Sybase have a limit of around 2000 bind variables per query
See also:
http://stackoverflow.com/q/11266609/521799
> Inefficient deletion of ChangeLog objects by AbstractBundlePersistenceManager
> -----------------------------------------------------------------------------
>
> Key: JCR-3557
> URL: https://issues.apache.org/jira/browse/JCR-3557
> Project: Jackrabbit Content Repository
> Issue Type: Bug
> Components: jackrabbit-core
> Affects Versions: 2.6
> Reporter: Lukas Eder
> Priority: Minor
> Attachments: Yourkit_Screenshot_SQL.png,
> Yourkit_Screenshot_Stacktrace.png
>
>
> Deleting a tree of nodes in Jackrabbit results in a prohibitive amount of SQL
> queries being executed when using the BundleDbPersistenceManager. Please
> consider the attached Screenshots from Yourkit Profiler, which illustrate how
> saving the deletion of a single node through the JCR API results in 34 SQL
> DELETE statements for my test case:
> - Yourkit_Screenshot_Stacktrace.png: The stack trace from a single deletion.
> Please disregard the absolute values of the time column. It is biased by the
> profiling session.
> - Yourkit_Screenshot_SQL.png: An extract of the 34 SQL statements issued by
> the selected stack subtree
> While other operations are probably not very optimal either, this one is a
> very low hanging fruit, in my opinion. Consider the following piece of code
> in AbstractBundlePersistenceManager.storeInternal():
> {code}
> for (ItemState state : changeLog.deletedStates()) {
> if (state.isNode()) {
> NodePropBundle bundle = getBundle((NodeId) state.getId());
> if (bundle == null) {
> throw new NoSuchItemStateException(state.getId().toString());
> }
> deleteBundle(bundle);
> deleted.add(state.getId());
> }
> }
> {code}
> Now, instead of iterating over deletedStates and loading / deleting them one
> by one, they should be bulk-loaded / bulk-deleted as such:
> {code}
> List<NodeId> nodeIds = new ArrayList<NodeId>();
> for (ItemState state : changeLog.deletedStates()) {
> if (state.isNode()) {
> nodeIds.add((NodeId) state.getId());
> }
> }
> List<NodePropBundle> bundles = new ArrayList<NodePropBundle>();
> bundles.addAll(getBundles(nodeIds)); // Can throw NoSuchItemStateException
> deleteBundles(bundles);
> deleted.addAll(nodeIds);
> {code}
> For backwards-compatibility and convenience, AbstractBundlePersistenceManager
> would provide trivial default implementations for getBundles() and
> deleteBundles():
> {code}
> protected List<NodePropBundle> getBundles(Collection<? extends NodeId>
> nodeIds) throws ItemStateException {
> List<NodePropBundle> result = new ArrayList<NodePropBundle>();
> for (NodeId nodeId : nodeIds) {
> result.add(getBundle(nodeId));
> }
> return result;
> }
> protected void deleteBundles(Collection<? extends NodePropBundle> bundles)
> throws ItemStateException {
> for (NodePropBundle bundle : bundles) {
> deleteBundle(bundle);
> }
> }
> {code}
> But the BundleDbPersistenceManager could override this default behaviour by
> executing something like
> {code}
> -- getBundles()
> select NODE_ID, BUNDLE_DATA from BUNDLE where NODE_ID in (?, ?, ...)
> -- deleteBundles()
> delete from BUNDLE where NODE_ID in (?, ?, ...)
> {code}
> The same also applies for deleting from the REFS table.
> If this seems like a viable improvement and if I didn't overlook something
> subtle where one-by-one selection / deletion of bundle data is important, I
> could go ahead and provide a patch for this issue. Once such an improvement
> is in place, other bundle persistence managers might take advantage of the
> new algorithm as well. Also, we could review adding / updating bundles in
> another JIRA ticket.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira