RE: CALL ENDS : HACK: HugeNodeListIterator

Kees Jongenburger Wed, 12 May 2004 03:59:08 -0700

Hmm if there is still a disc on such a subject is it fair to close a
hack proposal?





-----Original Message-----
From:   [EMAIL PROTECTED] on behalf of Pierre van Rooden
Sent:   Wed 2004-05-12 12:42 PM
To:     [EMAIL PROTECTED]
Cc:     
Subject:        CALL ENDS : HACK: HugeNodeListIterator

CALL FOR:
HugeNodeListIterator

Called by: Michiel Meeuwissen
Total tally on this call : -2

START:             2004-04-21 22:30
START OF VOTING:   2004-04-26 22:30
END OF CALL:       2004-04-29 22:30

YEA (0) :

ABSTAIN (3) : Pierre van Rooden, Eduard Witteveen, Rob van Maris

NAY (2) : Kees Jongenburger, Ernst Bunders

VETO (0) :

No votes, assumed abstained (10): Jaco de Groot, Marcel Maatkamp, Andre
van Toly, Johannes Verelst, Daniel Ockeloen, Rob Vermeulen, Rico Jansen,
Nico Klasens, Gerard van Enk, Mark Huijser,

Call Result:
The call failed. The hack is not accepted.


> Michiel Meeuwissen wrote:
> 
>> When you want to do exceptionally large iterations over MMBase nodes,

>> using
>> the bridge, you might not be content with the current way to do that:
>>
>> NodeList list = nodeManager.getList(query);
>> NodeIterator iterator = list.nodeIterator();
>> while (iterator.hasNext()) {
>>  ..
>>
>> }
>>
>>
>> Because, you need all nodes in memory then ('list' in the above
example),
>> and also, the query result is cashed, so the memory is not even given

>> back,
>> immediately after completion.
>>
>> I have created a job that wants to iterate over all nodes of a
certain 
>> type
>> every night. And there are quite a lot of such nodes, so I felt that
I 
>> need
>> something better here.
>>
>> I created a 'HugeNodeListIterator', a NodeListIterator implementation

>> that
>> one can use as follows:
>>
>> NodeIterator iterator = HugeNodeListIterator(query);
>> while (iterator.hasNext()) {
>>  ...
>> }
>>
>> The implementation of this NodeList ensures that the query result is 
>> not too
>> big, by applying a 'limit' to the query and doing it in batches (so,
the
>> query is 'uniquely sorted' and the next batch is found by applying a
>> constraint).
>>
>> Of course this is not perfect, because you would need no batching at 
>> all if
>> you have direct access to the database cursor, but you cannot have
that
>> throught the bridge. When and if this becomes possible, the
>> HugeNodeListIterator implementation can be changed in a 
>> backwards-compatible
>> way to use this. But I dont anticipate such a new feature in the
bridge
>> anytime soon.
>>
>> The other current problem is that it accesses the cache-framework to
take
>> the query-results from the Cache, which were put there by the
bridge..
>> Therefore RMMCI cannot currently use this. This indicates that there
>> problably should come a RMMCI (or even MMCI) way to access the Cache
>> administration. The other option would perhaps be to add
>> HugeNodeListIterator itself somehow to RMMCI. And finally, if and
when 
>> the
>> bridge has options to disable caching of a certain query result, of 
>> course
>> this implementation can be changed to use that, in that way loosing 
>> the need
>> to access the Caches itself.
>>
>>
>> I want to offer this HugeNodeListIterator as a hack, because I think 
>> it may
>> be useful for others as well. I currently don't need it for an MMBase

>> project, so
>> I cant add it like that.
>>
>> It is for CVS HEAD, in the package org.mmbase.bridge.util. It can
also be
>> taken to work in 1.7 environments. It is attached to this mail. But
note
>> that before committing I may apply implementation improvements if 
>> those are
>> suggested in reaction.
>>
>>
>>
>>  START OF CALL: 2004-04-21 22:30
>>  END OF CALL:   2004-04-26 22:30
>>
>>  [_] +1 (YES)
>>  [_] +0 (ABSTAIN )
>>  [_] -1 (NO), because :
>>  [_] VETO, because:
>>
>>
>> Michiel
>>
>> -- 
>> Michiel Meeuwissen       |
>> Mediacentrum 140 H'sum   | +31 (0)35 6772979        |  I hate
computers
>> nl_NL eo_XX en_US        |
>> mihxil'                  |
>>  [] ()                   |
>>
>>
>>
------------------------------------------------------------------------
>>
>> /*
>>
>> This software is OSI Certified Open Source Software.
>> OSI Certified is a certification mark of the Open Source Initiative.
>>
>> The license (Mozilla version 1.0) can be read at the MMBase site.
>> See http://www.MMBase.org/license
>>
>> */
>>
>> package org.mmbase.bridge.util;
>>
>> import org.mmbase.bridge.*;
>> import org.mmbase.storage.search.*;
>> import org.mmbase.cache.*;
>>
>> import java.util.*;
>>
>> /**
>>  * Iterates the big result of a query. It avoids using a lot of
memory 
>> (which you would need if you
>>  * get the complete NodeList first), and pollution of the (node) 
>> cache. In this current
>>  * implementation the Query is 'batched' to avoid reading in all
nodes 
>> in memory, and the batches
>>  * are removed from the query-caches.
>>  *
>>  * @author  Michiel Meeuwissen
>>  * @version $Id: HugeNodeListIterator.java,v 1.2 2004/04/05 19:04:12 
>> michiel Exp $
>>  * @since   MMBase-1.8
>>  */
>>
>> public class HugeNodeListIterator implements NodeIterator {
>>
>>     // will not work through RMMCI, because caches are accessed.
>>     protected static MultilevelCache multilevelCache  = 
>> MultilevelCache.getCache();     protected static NodeListCache 
>> nodeListCache      = NodeListCache.getCache();
>>
>>     public static final int DEFAULT_BATCH_SIZE = 10000;
>>
>>     protected NodeIterator nodeIterator;
>>     protected Node         nextNode;
>>     protected Node         previousNode;
>>
>>     protected Query   originalQuery;
>>     protected int     batchSize = DEFAULT_BATCH_SIZE;
>>
>>     protected int     nextIndex = 0;
>>
>>    
>>     /**
>>      * Constructor for this Iterator.
>>      *
>>      * @param query     The query which is used as a base for the 
>> querie(s) to be executed.
>>      * @param batchSize The (approximate) size of the sub-queries, 
>> should be a reasonably large
>>      *                   number, like 10000 or so.
>>      */
>>     public HugeNodeListIterator(Query query, int batchSize) {
>>         this.batchSize = batchSize;
>>         init(query);
>>     }
>>
>>     /**
>>      * Constructor for this Iterator. The 'batchSize' is taken from 
>> the query's 'maxnumber'
>>      * properties, or, it that is not set, it is defaulted to 10000.
>>      *
>>      * @param query      The query which is used as a base for the 
>> querie(s) to be executed.
>>      */
>>     public HugeNodeListIterator(Query query) {
>>         if (query.getMaxNumber() != SearchQuery.DEFAULT_MAX_NUMBER) {
>>             batchSize = query.getMaxNumber();
>>         } // else leave on default;
>>         init(query);
>>     }
>>
>>     /**
>>      * Called by constructors only
>>      */
>>     private void init(Query query) {
>>         if (query.getOffset() > 0) {
>>             throw new UnsupportedOperationException("Not implemented 
>> for queries with offset");
>>         }
>>         Queries.sortUniquely(query);
>>         originalQuery = query;
>>         executeNextQuery((Query) originalQuery.clone());
>>     }
>>
>>         /**
>>      * Executes the given query, taking into account the fact wether 
>> it is NodeQuery or not, and
>>      * applying the 'batchSize'. The result is available in the 
>> 'nodeIterator' member.
>>      */
>>     protected void executeQuery(Query currentQuery) {
>>         currentQuery.setMaxNumber(batchSize);
>>         if (originalQuery instanceof NodeQuery) {
>>             NodeQuery nq = (NodeQuery) currentQuery;
>>             nodeIterator = 
>> nq.getNodeManager().getList(nq).nodeIterator();
>>             nodeListCache.remove(nq);
>>         } else {
>>             nodeIterator = 
>> currentQuery.getCloud().getList(currentQuery).nodeIterator();
>>             multilevelCache.remove(currentQuery);
>>         }
>>     }
>>
>>     /**      * Executes the given query, and prepares to do 'next',
so 
>> setting 'nextNode' and 'previousNode'.
>>      */
>>     protected void executeNextQuery(Query q) {
>>         executeQuery(q);
>>         previousNode = nextNode;
>>         if (nodeIterator.hasNext()) {
>>             nextNode = nodeIterator.nextNode();                    } 
>> else {
>>             nextNode = null;
>>         }
>>     }
>>     /**      * Executes the given query, and prepares to do 
>> 'previous', so setting 'nextNode' and
>>      * 'previousNode', and winds the new nodeIterator to the end.
>>      */
>>     protected void executePreviousQuery(Query q) {
>>         executeQuery(q);
>>         nextNode = previousNode;
>>         while (nodeIterator.hasNext()) {
>>             nodeIterator.next();
>>         }
>>         if (nodeIterator.hasPrevious()) {
>>             previousNode = nodeIterator.nextNode();

>> } else {
>>             previousNode = null;
>>         }
>>     }
>>
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public boolean hasNext() {
>>         return nextNode != null;
>>     }
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public boolean hasPrevious() {
>>         return previousNode != null;
>>     }
>>
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public Object next() {
>>         return nextNode();
>>     }
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public Object previous() {
>>         return previous();
>>     }
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public int previousIndex() {
>>         return nextIndex - 1;
>>     }
>>     /**
>>      * [EMAIL PROTECTED]
>>      */
>>     public int nextIndex() {
>>         return nextIndex;
>>     }
>>
>>
>>     /**
>>      * Used by nextNode and previousNode. Does a field-by-field 
>> compare of two Node objects to check
>>      * if they are equal. One would expect the equals-member function

>> of Node to be useable for
>>      * this, but that seems not to be the case.
>>      */
>>     protected boolean equals(Node node1, Node node2) {
>>         Iterator i = node1.getNodeManager().getFields().iterator();
>>         while (i.hasNext()) {
>>             Field f = (Field) i.next();
>>             String name = f.getName();
>>             if (! node1.getValue(name).equals(node2.getValue(name)))
{
>>                 return false;
>>             }                    }
>>         return true;
>>
>>     }
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      *
>>      * Implementation calculates also the next next Node, and gives 
>> back the 'old' next Node, from
>>      * now on known as 'previousNode'.
>>      */
>>     public  Node nextNode() {
>>         if (nextNode != null) {
>>             nextIndex++;
>>             previousNode = nextNode;
>>             if (nodeIterator.hasNext()) {
>>                 nextNode = nodeIterator.nextNode();
>>             } else {
>>                 Query currentQuery = (Query) originalQuery.clone();
>>
>>                 // We don't use offset to determin the 'next' batch
of 
>> query results
>>                 // because there could have been
deletions/insertions.
>>                 // We use the sort-order to apply a constraint.
>>                 SortOrder order = (SortOrder) 
>> originalQuery.getSortOrders().get(0);
>>                 Constraint cons;
>>                 if (order.getDirection() ==
SortOrder.ORDER_ASCENDING) {
>>                     cons = 
>> currentQuery.createConstraint(order.getField(), 
>> FieldCompareConstraint.GREATER_EQUAL, 
>> previousNode.getValue(order.getField().getFieldName()));
>>                 } else {
>>                     cons = 
>> currentQuery.createConstraint(order.getField(), 
>> FieldCompareConstraint.LESS_EQUAL, 
>> previousNode.getValue(order.getField().getFieldName()));
>>                 }
>>                 Queries.addConstraint(currentQuery, cons);
>>                 executeNextQuery(currentQuery);
>>                 // perhaps the sort-order did not find a unique 
>> result, skip some nodes in that case.
>>                 // XXX This wrong if (which is unlikely) there follow

>> more nodes than 'batchSize'.
>>                 while(nextNode != null && equals(nextNode, 
>> previousNode)) {
>>                     if (nodeIterator.hasNext()) {
>>                         nextNode = nodeIterator.nextNode();
>>                     } else {
>>                         nextNode = null;
>>                     }
>>                 }
>>             }
>>
>>             return previousNode; // looks odd, but really is wat is 
>> meant.
>>         } else {
>>             throw new NoSuchElementException("No next element");
>>         }
>>     }
>>
>>     /**
>>      * [EMAIL PROTECTED]
>>      *
>>      * Implementation is analogous to nextNode.
>>      */
>>     public  Node previousNode() {
>>         if (previousNode != null) {
>>             nextNode = previousNode;
>>             nextIndex --;
>>             if (nodeIterator.hasPrevious()) {
>>                 previousNode = nodeIterator.previousNode();
>>             } else {
>>                 Query currentQuery = (Query) originalQuery.clone();
>>                 SortOrder order = (SortOrder) 
>> originalQuery.getSortOrders().get(0);
>>                 Constraint cons;
>>                 if (order.getDirection() ==
SortOrder.ORDER_ASCENDING) {
>>                     cons = 
>> currentQuery.createConstraint(order.getField(), 
>> FieldCompareConstraint.LESS_EQUAL, 
>> nextNode.getValue(order.getField().getFieldName()));
>>                 } else {
>>                     cons = 
>> currentQuery.createConstraint(order.getField(), 
>> FieldCompareConstraint.GREATER_EQUAL, 
>> nextNode.getValue(order.getField().getFieldName()));
>>                 }
>>                 Queries.addConstraint(currentQuery, cons);
>>                 executePreviousQuery(currentQuery);
>>                 while(previousNode != null && equals(nextNode, 
>> previousNode)) {
>>                     if (nodeIterator.hasPrevious()) {
>>                         previousNode = nodeIterator.previousNode();
>>                     } else {
>>                         previousNode = null;
>>                     }
>>                 }
>>                             }
>>             return nextNode;
>>         } else {
>>             throw new NoSuchElementException("No previous element");
>>         }
>>     }
>>
>>
>>     /**
>>      * @throws UnsupportedOperationException      */
>>     public void remove() {
>>         throw new UnsupportedOperationException("Optional operation 
>> 'remove' not implemented");
>>     }
>>
>>     /**
>>      * @throws UnsupportedOperationException      */
>>     public void add(Object o) {
>>         throw new UnsupportedOperationException("Optional operation 
>> 'add' not implemented");
>>     }
>>
>>     /**
>>      * @throws UnsupportedOperationException      */
>>     public void set(Object o) {
>>         throw new UnsupportedOperationException("Optional operation 
>> 'set' not implemented");
>>     }
>>
>>     /**
>>      * Main only for testing.
>>      */
>>     public static void main(String[] args) {
>>         Cloud cloud = 
>> ContextProvider.getDefaultCloudContext().getCloud("mmbase");
>>         NodeQuery q = cloud.getNodeManager("object").createQuery();
>>         HugeNodeListIterator nodeIterator = new
HugeNodeListIterator(q);
>>         while (nodeIterator.hasNext()) {
>>             Node node = nodeIterator.nextNode();
>>             System.out.println(node.getFunctionValue("gui", 
>> null).toString());
>>         }
>>
>>     }
>>     
>> }
> 
> 
> 


-- 
Pierre van Rooden
Mediapark, C 107 tel. +31 (0)35 6772815
"Never summon anything bigger than your head."

<<winmail.dat>>

RE: CALL ENDS : HACK: HugeNodeListIterator

Reply via email to