Sergiy Shyrkov created JCR-3572:
-----------------------------------

             Summary: Optimization in NodeImpl.getNodes(String) and 
getNodes(String[])
                 Key: JCR-3572
                 URL: https://issues.apache.org/jira/browse/JCR-3572
             Project: Jackrabbit Content Repository
          Issue Type: Improvement
          Components: jackrabbit-core
    Affects Versions: 2.6, 2.5.3, 2.4.3, 2.2.13
            Reporter: Sergiy Shyrkov
            Priority: Minor


We were doing some refactoring/optimization of our code (we are using 
Jackrabbit 2.2.x code) and replacing in some places traversal over child nodes 
(node.getNodes()) with node.getNodes(namePattern) as it suppose to speed up the 
things by only returning children, matching particular name pattern (in our 
case those our internationalization nodes, starting with "translation-").

Profiling the code after we noticed that node.getNodes(namePattern) is actually 
slower than simple traversal over all children checking their name match.

We can down to the org.apache.jackrabbit.util.ChildrenCollectorFilter which is 
used in the NodeImpl.getNodes(String) and getNodes(String[])

Even if the instance of the filter is created with collectProperties set to 
false the parent (javax.jcr.util.TraversingItemVisitor.visit(Node)) still reads 
the properties of the nodes and iterates over them which seems useless.

Digging deeper it seems there is an optimization which can be done to 
getNodes(String) and getNodes(String[]) earlier.

Those methods could call the itemMgr.getChildNodes(), like the getNodes() does, 
passing additionally namePattern / nameGlobs.
The ItemManager.ChildNodeEntry() already iterates on ChildNodeEntry instances, 
which have Name, so the filtering can be done here, i.e. much earlier, than in 
the LazyItemIterator.prefetchNext().

The adjusted code for org.apache.jackrabbit.core.ItemManager.getChildNodes(), 
we've tested on our side looks as follows:

    synchronized NodeIterator getChildNodes(NodeId parentId)
            throws ItemNotFoundException, AccessDeniedException, 
RepositoryException {
        return getChildNodesInternal(parentId, null, null);
    }

    synchronized NodeIterator getChildNodes(NodeId parentId, String 
namePattern, String[] nameGlobs)
            throws ItemNotFoundException, AccessDeniedException, 
RepositoryException {
        return getChildNodesInternal(parentId, namePattern, nameGlobs);
    }

    private NodeIterator getChildNodesInternal(NodeId parentId, String 
namePattern, String[] nameGlobs)
            throws ItemNotFoundException, AccessDeniedException, 
RepositoryException {
        sanityCheck();

        ItemData data = getItemData(parentId);
        if (!data.isNode()) {
            String msg = "can't list child nodes of property " + parentId;
            log.debug(msg);
            throw new RepositoryException(msg);
        }
        ArrayList<ItemId> childIds = new ArrayList<ItemId>();
        Iterator<ChildNodeEntry> iter = ((NodeState) 
data.getState()).getChildNodeEntries().iterator();

        while (iter.hasNext()) {
            ChildNodeEntry entry = iter.next();
            // delay check for read-access until item is being built
            // thus avoid duplicate check
            
            // check for name
            if (namePattern == null && nameGlobs == null ||
                    namePattern != null
                    && 
ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()), 
namePattern)
                    || nameGlobs != null
                    && 
ChildrenCollectorFilter.matches(sessionContext.getJCRName(entry.getName()), 
nameGlobs)) {
                childIds.add(entry.getId());
            }
        }

        return new LazyItemIterator(sessionContext, childIds, parentId);
    }


and in the NodeImpl we call ItemManager.getChildNodes() as follows:


    public NodeIterator getNodes(String namePattern) throws RepositoryException 
{
        return getNodesInternal(namePattern, null);
    }

    public NodeIterator getNodes(final String[] nameGlobs)
            throws RepositoryException {
        return getNodesInternal(null, nameGlobs);
    }

    private NodeIterator getNodesInternal(final String namePattern, final 
String[] nameGlobs) throws RepositoryException {
        return perform(new SessionOperation<NodeIterator>() {
            public NodeIterator perform(SessionContext context)
                    throws RepositoryException {
                try {
                    return itemMgr.getChildNodes((NodeId) id, namePattern, 
nameGlobs);
                } catch (ItemNotFoundException e) {
                    throw new RepositoryException(
                            "Failed to list child nodes of " + NodeImpl.this, 
e);
                } catch (AccessDeniedException e) {
                    throw new RepositoryException(
                            "Failed to list child nodes of " + NodeImpl.this, 
e);
                }
            }
            public String toString() {
                return "node.getNodes()";
            }
        });
    }



Could you please provide your feedback if the optimization makes sense and 
looks consistent or if there are any issues with those changes?

I could provide the patch of suggested changes. I just need to know against 
which branch of Jackrabbit code it should be done.

Thank you in advance!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to