Sergiy Shyrkov created JCR-3572:
-----------------------------------
Summary: Optimization in NodeImpl.getNodes(String) and
getNodes(String[])
Key: JCR-3572
URL: https://issues.apache.org/jira/browse/JCR-3572
Project: Jackrabbit Content Repository
Issue Type: Improvement
Components: jackrabbit-core
Affects Versions: 2.6, 2.5.3, 2.4.3, 2.2.13
Reporter: Sergiy Shyrkov
Priority: Minor
We were doing some refactoring/optimization of our code (we are using
Jackrabbit 2.2.x code) and replacing in some places traversal over child nodes
(node.getNodes()) with node.getNodes(namePattern) as it suppose to speed up the
things by only returning children, matching particular name pattern (in our
case those our internationalization nodes, starting with "translation-").
Profiling the code after we noticed that node.getNodes(namePattern) is actually
slower than simple traversal over all children checking their name match.
We can down to the org.apache.jackrabbit.util.ChildrenCollectorFilter which is
used in the NodeImpl.getNodes(String) and getNodes(String[])
Even if the instance of the filter is created with collectProperties set to
false the parent (javax.jcr.util.TraversingItemVisitor.visit(Node)) still reads
the properties of the nodes and iterates over them which seems useless.
Digging deeper it seems there is an optimization which can be done to
getNodes(String) and getNodes(String[]) earlier.
Those methods could call the itemMgr.getChildNodes(), like the getNodes() does,
passing additionally namePattern / nameGlobs.
The ItemManager.ChildNodeEntry() already iterates on ChildNodeEntry instances,
which have Name, so the filtering can be done here, i.e. much earlier, than in
the LazyItemIterator.prefetchNext().
The adjusted code for org.apache.jackrabbit.core.ItemManager.getChildNodes(),
we've tested on our side looks as follows:
synchronized NodeIterator getChildNodes(NodeId parentId)
throws ItemNotFoundException, AccessDeniedException,
RepositoryException {
return getChildNodesInternal(parentId, null, null);
}
synchronized NodeIterator getChildNodes(NodeId parentId, String
namePattern, String[] nameGlobs)
throws ItemNotFoundException, AccessDeniedException,
RepositoryException {
return getChildNodesInternal(parentId, namePattern, nameGlobs);
}
private NodeIterator getChildNodesInternal(NodeId parentId, String
namePattern, String[] nameGlobs)
throws ItemNotFoundException, AccessDeniedException,
RepositoryException {
sanityCheck();
ItemData data = getItemData(parentId);
if (!data.isNode()) {
String msg = "can't list child nodes of property " + parentId;
log.debug(msg);
throw new RepositoryException(msg);
}
ArrayList<ItemId> childIds = new ArrayList<ItemId>();
Iterator<ChildNodeEntry> iter = ((NodeState)
data.getState()).getChildNodeEntries().iterator();
while (iter.hasNext()) {
ChildNodeEntry entry = iter.next();
// delay check for read-access until item is being built
// thus avoid duplicate check
// check for name
if (namePattern == null && nameGlobs == null ||
namePattern != null
&&
ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()),
namePattern)
|| nameGlobs != null
&&
ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()),
nameGlobs)) {
childIds.add(entry.getId());
}
}
return new LazyItemIterator(sessionContext, childIds, parentId);
}
and in the NodeImpl we call ItemManager.getChildNodes() as follows:
public NodeIterator getNodes(String namePattern) throws RepositoryException
{
return getNodesInternal(namePattern, null);
}
public NodeIterator getNodes(final String[] nameGlobs)
throws RepositoryException {
return getNodesInternal(null, nameGlobs);
}
private NodeIterator getNodesInternal(final String namePattern, final
String[] nameGlobs) throws RepositoryException {
return perform(new SessionOperation<NodeIterator>() {
public NodeIterator perform(SessionContext context)
throws RepositoryException {
try {
return itemMgr.getChildNodes((NodeId) id, namePattern,
nameGlobs);
} catch (ItemNotFoundException e) {
throw new RepositoryException(
"Failed to list child nodes of " + NodeImpl.this,
e);
} catch (AccessDeniedException e) {
throw new RepositoryException(
"Failed to list child nodes of " + NodeImpl.this,
e);
}
}
public String toString() {
return "node.getNodes()";
}
});
}
Could you please provide your feedback if the optimization makes sense and
looks consistent or if there are any issues with those changes?
I could provide the patch of suggested changes. I just need to know against
which branch of Jackrabbit code it should be done.
Thank you in advance!
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira