CALL EXTENDED: : HACK: HugeNodeListIterator

Pierre van Rooden Fri, 07 May 2004 07:07:23 -0700

CALL FOR:
HugeNodeListIterator

Called by: Michiel Meeuwissen
Total tally on this call : -1

START:             2004-04-21 22:30
START OF VOTING:   2004-04-26 22:30
END OF CALL:       2004-05-11 22:30

YEA (0) :

ABSTAIN (1) : Pierre van Rooden, Eduard Witteveen

NAY (1) : Kees Jongenburger

VETO (0) :

No votes, assumed abstained (12): Jaco de Groot, Marcel Maatkamp, Andre van Toly, Johannes Verelst, Daniel Ockeloen, Rob Vermeulen, Rico Jansen, Nico Klasens, Rob van Maris, Gerard van Enk, Mark Huijser, Ernst Bunders

Call result:

Since there were only three clear votes, I am extending this hack call to Tuesday, 11th May.

Michiel Meeuwissen wrote:

When you want to do exceptionally large iterations over MMBase nodes, using
the bridge, you might not be content with the current way to do that:
NodeList list = nodeManager.getList(query);
NodeIterator iterator = list.nodeIterator();
while (iterator.hasNext()) {
 ..
}
Because, you need all nodes in memory then ('list' in the above example),
and also, the query result is cashed, so the memory is not even given back,
immediately after completion.
I have created a job that wants to iterate over all nodes of a certain type
every night. And there are quite a lot of such nodes, so I felt that I need
something better here.
I created a 'HugeNodeListIterator', a NodeListIterator implementation that
one can use as follows:
NodeIterator iterator = HugeNodeListIterator(query);
while (iterator.hasNext()) {
 ...
}
The implementation of this NodeList ensures that the query result is not too
big, by applying a 'limit' to the query and doing it in batches (so, the
query is 'uniquely sorted' and the next batch is found by applying a
constraint).
Of course this is not perfect, because you would need no batching at all if
you have direct access to the database cursor, but you cannot have that
throught the bridge. When and if this becomes possible, the
HugeNodeListIterator implementation can be changed in a backwards-compatible
way to use this. But I dont anticipate such a new feature in the bridge
anytime soon.
The other current problem is that it accesses the cache-framework to take
the query-results from the Cache, which were put there by the bridge..
Therefore RMMCI cannot currently use this. This indicates that there
problably should come a RMMCI (or even MMCI) way to access the Cache
administration. The other option would perhaps be to add
HugeNodeListIterator itself somehow to RMMCI. And finally, if and when the
bridge has options to disable caching of a certain query result, of course
this implementation can be changed to use that, in that way loosing the need
to access the Caches itself.
I want to offer this HugeNodeListIterator as a hack, because I think it may be useful for others as well. I currently don't need it for an MMBase project, so I cant add it like that.
It is for CVS HEAD, in the package org.mmbase.bridge.util. It can also be
taken to work in 1.7 environments. It is attached to this mail. But note
that before committing I may apply implementation improvements if those are
suggested in reaction.
 START OF CALL: 2004-04-21 22:30
 END OF CALL:   2004-04-26 22:30
 [_] +1 (YES)
 [_] +0 (ABSTAIN )
 [_] -1 (NO), because :
 [_] VETO, because:
Michiel

-- Michiel Meeuwissen | Mediacentrum 140 H'sum | +31 (0)35 6772979 | I hate computers nl_NL eo_XX en_US | mihxil' | [] () |

------------------------------------------------------------------------

/*
This software is OSI Certified Open Source Software.
OSI Certified is a certification mark of the Open Source Initiative.
The license (Mozilla version 1.0) can be read at the MMBase site.
See http://www.MMBase.org/license
*/

package org.mmbase.bridge.util;
import org.mmbase.bridge.*;
import org.mmbase.storage.search.*;
import org.mmbase.cache.*;
import java.util.*;
/**
 * Iterates the big result of a query. It avoids using a lot of memory (which you 
would need if you
 * get the complete NodeList first), and pollution of the (node) cache. In this current
 * implementation the Query is 'batched' to avoid reading in all nodes in memory, and 
the batches
 * are removed from the query-caches.
 *
 * @author  Michiel Meeuwissen
 * @version $Id: HugeNodeListIterator.java,v 1.2 2004/04/05 19:04:12 michiel Exp $
 * @since   MMBase-1.8
 */
public class HugeNodeListIterator implements NodeIterator {

// will not work through RMMCI, because caches are accessed. protected static MultilevelCache multilevelCache = MultilevelCache.getCache(); protected static NodeListCache nodeListCache = NodeListCache.getCache();

public static final int DEFAULT_BATCH_SIZE = 10000;
    protected NodeIterator nodeIterator;
    protected Node         nextNode;
    protected Node         previousNode;
    protected Query   originalQuery;
    protected int     batchSize = DEFAULT_BATCH_SIZE;
protected int nextIndex = 0;
    /**
     * Constructor for this Iterator.
     *
     * @param query     The query which is used as a base for the querie(s) to be 
executed.
     * @param batchSize The (approximate) size of the sub-queries, should be a 
reasonably large
     *                   number, like 10000 or so.
     */
    public HugeNodeListIterator(Query query, int batchSize) {
        this.batchSize = batchSize;
        init(query);
    }
    /**
     * Constructor for this Iterator. The 'batchSize' is taken from the query's 
'maxnumber'
     * properties, or, it that is not set, it is defaulted to 10000.
     *
     * @param query      The query which is used as a base for the querie(s) to be 
executed.
     */
    public HugeNodeListIterator(Query query) {
        if (query.getMaxNumber() != SearchQuery.DEFAULT_MAX_NUMBER) {
            batchSize = query.getMaxNumber();
        } // else leave on default;
        init(query);
    }
    /**
     * Called by constructors only
     */
    private void init(Query query) {
        if (query.getOffset() > 0) {
            throw new UnsupportedOperationException("Not implemented for queries with 
offset");
        }
        Queries.sortUniquely(query);
        originalQuery = query;
        executeNextQuery((Query) originalQuery.clone());
    }
/** * Executes the given query, taking into account the fact wether it is NodeQuery or not, and * applying the 'batchSize'. The result is available in the 'nodeIterator' member. */ protected void executeQuery(Query currentQuery) { currentQuery.setMaxNumber(batchSize); if (originalQuery instanceof NodeQuery) { NodeQuery nq = (NodeQuery) currentQuery; nodeIterator = nq.getNodeManager().getList(nq).nodeIterator(); nodeListCache.remove(nq); } else { nodeIterator = currentQuery.getCloud().getList(currentQuery).nodeIterator(); multilevelCache.remove(currentQuery); } }

/** * Executes the given query, and prepares to do 'next', so setting 'nextNode' and 'previousNode'. */ protected void executeNextQuery(Query q) { executeQuery(q); previousNode = nextNode; if (nodeIterator.hasNext()) { nextNode = nodeIterator.nextNode(); } else { nextNode = null; } } /** * Executes the given query, and prepares to do 'previous', so setting 'nextNode' and * 'previousNode', and winds the new nodeIterator to the end. */ protected void executePreviousQuery(Query q) { executeQuery(q); nextNode = previousNode; while (nodeIterator.hasNext()) { nodeIterator.next(); } if (nodeIterator.hasPrevious()) { previousNode = nodeIterator.nextNode(); } else { previousNode = null; } }
    /**
     * [EMAIL PROTECTED]
     */
    public boolean hasNext() {
        return nextNode != null;
    }
    /**
     * [EMAIL PROTECTED]
     */
    public boolean hasPrevious() {
        return previousNode != null;
    }
    /**
     * [EMAIL PROTECTED]
     */
    public Object next() {
        return nextNode();
    }
    /**
     * [EMAIL PROTECTED]
     */
    public Object previous() {
        return previous();
    }
    /**
     * [EMAIL PROTECTED]
     */
    public int previousIndex() {
        return nextIndex - 1;
    }
    /**
     * [EMAIL PROTECTED]
     */
    public int nextIndex() {
        return nextIndex;
    }
/** * Used by nextNode and previousNode. Does a field-by-field compare of two Node objects to check * if they are equal. One would expect the equals-member function of Node to be useable for * this, but that seems not to be the case. */ protected boolean equals(Node node1, Node node2) { Iterator i = node1.getNodeManager().getFields().iterator(); while (i.hasNext()) { Field f = (Field) i.next(); String name = f.getName(); if (! node1.getValue(name).equals(node2.getValue(name))) { return false; } } return true;

}
    /**
     * [EMAIL PROTECTED]
     *
     * Implementation calculates also the next next Node, and gives back the 'old' 
next Node, from
     * now on known as 'previousNode'.
     */
    public  Node nextNode() {
        if (nextNode != null) {
            nextIndex++;
            previousNode = nextNode;
            if (nodeIterator.hasNext()) {
                nextNode = nodeIterator.nextNode();
            } else {
                Query currentQuery = (Query) originalQuery.clone();
// We don't use offset to determin the 'next' batch of query results // because there could have been deletions/insertions. // We use the sort-order to apply a constraint. SortOrder order = (SortOrder) originalQuery.getSortOrders().get(0); Constraint cons; if (order.getDirection() == SortOrder.ORDER_ASCENDING) { cons = currentQuery.createConstraint(order.getField(), FieldCompareConstraint.GREATER_EQUAL, previousNode.getValue(order.getField().getFieldName())); } else { cons = currentQuery.createConstraint(order.getField(), FieldCompareConstraint.LESS_EQUAL, previousNode.getValue(order.getField().getFieldName())); } Queries.addConstraint(currentQuery, cons); executeNextQuery(currentQuery);
                // perhaps the sort-order did not find a unique result, skip some 
nodes in that case.
                // XXX This wrong if (which is unlikely) there follow more nodes than 
'batchSize'.
                while(nextNode != null && equals(nextNode, previousNode)) {
                    if (nodeIterator.hasNext()) {
                        nextNode = nodeIterator.nextNode();
                    } else {
                        nextNode = null;
                    }
                }
            }
            return previousNode; // looks odd, but really is wat is meant.
        } else {
            throw new NoSuchElementException("No next element");
        }
    }
/** * [EMAIL PROTECTED] * * Implementation is analogous to nextNode. */ public Node previousNode() { if (previousNode != null) { nextNode = previousNode; nextIndex --; if (nodeIterator.hasPrevious()) { previousNode = nodeIterator.previousNode(); } else { Query currentQuery = (Query) originalQuery.clone(); SortOrder order = (SortOrder) originalQuery.getSortOrders().get(0); Constraint cons; if (order.getDirection() == SortOrder.ORDER_ASCENDING) { cons = currentQuery.createConstraint(order.getField(), FieldCompareConstraint.LESS_EQUAL, nextNode.getValue(order.getField().getFieldName())); } else { cons = currentQuery.createConstraint(order.getField(), FieldCompareConstraint.GREATER_EQUAL, nextNode.getValue(order.getField().getFieldName())); } Queries.addConstraint(currentQuery, cons); executePreviousQuery(currentQuery); while(previousNode != null && equals(nextNode, previousNode)) { if (nodeIterator.hasPrevious()) { previousNode = nodeIterator.previousNode(); } else { previousNode = null; } } } return nextNode; } else { throw new NoSuchElementException("No previous element"); } }

/** * @throws UnsupportedOperationException */ public void remove() { throw new UnsupportedOperationException("Optional operation 'remove' not implemented"); }

/** * @throws UnsupportedOperationException */ public void add(Object o) { throw new UnsupportedOperationException("Optional operation 'add' not implemented"); }

/** * @throws UnsupportedOperationException */ public void set(Object o) { throw new UnsupportedOperationException("Optional operation 'set' not implemented"); }
    /**
     * Main only for testing.
     */
    public static void main(String[] args) {
        Cloud cloud = ContextProvider.getDefaultCloudContext().getCloud("mmbase");
        NodeQuery q = cloud.getNodeManager("object").createQuery();
        HugeNodeListIterator nodeIterator = new HugeNodeListIterator(q);
        while (nodeIterator.hasNext()) {
            Node node = nodeIterator.nextNode();
            System.out.println(node.getFunctionValue("gui", null).toString());
        }
}

}

--
Pierre van Rooden
Mediapark, C 107 tel. +31 (0)35 6772815
"Never summon anything bigger than your head."

CALL EXTENDED: : HACK: HugeNodeListIterator

Reply via email to