Hmm if there is still a disc on such a subject is it fair to close a hack proposal?
-----Original Message-----
From: [EMAIL PROTECTED] on behalf of Pierre van Rooden
Sent: Wed 2004-05-12 12:42 PM
To: [EMAIL PROTECTED]
Cc:
Subject: CALL ENDS : HACK: HugeNodeListIterator
CALL FOR:
HugeNodeListIterator
Called by: Michiel Meeuwissen
Total tally on this call : -2
START: 2004-04-21 22:30
START OF VOTING: 2004-04-26 22:30
END OF CALL: 2004-04-29 22:30
YEA (0) :
ABSTAIN (3) : Pierre van Rooden, Eduard Witteveen, Rob van Maris
NAY (2) : Kees Jongenburger, Ernst Bunders
VETO (0) :
No votes, assumed abstained (10): Jaco de Groot, Marcel Maatkamp, Andre
van Toly, Johannes Verelst, Daniel Ockeloen, Rob Vermeulen, Rico Jansen,
Nico Klasens, Gerard van Enk, Mark Huijser,
Call Result:
The call failed. The hack is not accepted.
> Michiel Meeuwissen wrote:
>
>> When you want to do exceptionally large iterations over MMBase nodes,
>> using
>> the bridge, you might not be content with the current way to do that:
>>
>> NodeList list = nodeManager.getList(query);
>> NodeIterator iterator = list.nodeIterator();
>> while (iterator.hasNext()) {
>> ..
>>
>> }
>>
>>
>> Because, you need all nodes in memory then ('list' in the above
example),
>> and also, the query result is cashed, so the memory is not even given
>> back,
>> immediately after completion.
>>
>> I have created a job that wants to iterate over all nodes of a
certain
>> type
>> every night. And there are quite a lot of such nodes, so I felt that
I
>> need
>> something better here.
>>
>> I created a 'HugeNodeListIterator', a NodeListIterator implementation
>> that
>> one can use as follows:
>>
>> NodeIterator iterator = HugeNodeListIterator(query);
>> while (iterator.hasNext()) {
>> ...
>> }
>>
>> The implementation of this NodeList ensures that the query result is
>> not too
>> big, by applying a 'limit' to the query and doing it in batches (so,
the
>> query is 'uniquely sorted' and the next batch is found by applying a
>> constraint).
>>
>> Of course this is not perfect, because you would need no batching at
>> all if
>> you have direct access to the database cursor, but you cannot have
that
>> throught the bridge. When and if this becomes possible, the
>> HugeNodeListIterator implementation can be changed in a
>> backwards-compatible
>> way to use this. But I dont anticipate such a new feature in the
bridge
>> anytime soon.
>>
>> The other current problem is that it accesses the cache-framework to
take
>> the query-results from the Cache, which were put there by the
bridge..
>> Therefore RMMCI cannot currently use this. This indicates that there
>> problably should come a RMMCI (or even MMCI) way to access the Cache
>> administration. The other option would perhaps be to add
>> HugeNodeListIterator itself somehow to RMMCI. And finally, if and
when
>> the
>> bridge has options to disable caching of a certain query result, of
>> course
>> this implementation can be changed to use that, in that way loosing
>> the need
>> to access the Caches itself.
>>
>>
>> I want to offer this HugeNodeListIterator as a hack, because I think
>> it may
>> be useful for others as well. I currently don't need it for an MMBase
>> project, so
>> I cant add it like that.
>>
>> It is for CVS HEAD, in the package org.mmbase.bridge.util. It can
also be
>> taken to work in 1.7 environments. It is attached to this mail. But
note
>> that before committing I may apply implementation improvements if
>> those are
>> suggested in reaction.
>>
>>
>>
>> START OF CALL: 2004-04-21 22:30
>> END OF CALL: 2004-04-26 22:30
>>
>> [_] +1 (YES)
>> [_] +0 (ABSTAIN )
>> [_] -1 (NO), because :
>> [_] VETO, because:
>>
>>
>> Michiel
>>
>> --
>> Michiel Meeuwissen |
>> Mediacentrum 140 H'sum | +31 (0)35 6772979 | I hate
computers
>> nl_NL eo_XX en_US |
>> mihxil' |
>> [] () |
>>
>>
>>
------------------------------------------------------------------------
>>
>> /*
>>
>> This software is OSI Certified Open Source Software.
>> OSI Certified is a certification mark of the Open Source Initiative.
>>
>> The license (Mozilla version 1.0) can be read at the MMBase site.
>> See http://www.MMBase.org/license
>>
>> */
>>
>> package org.mmbase.bridge.util;
>>
>> import org.mmbase.bridge.*;
>> import org.mmbase.storage.search.*;
>> import org.mmbase.cache.*;
>>
>> import java.util.*;
>>
>> /**
>> * Iterates the big result of a query. It avoids using a lot of
memory
>> (which you would need if you
>> * get the complete NodeList first), and pollution of the (node)
>> cache. In this current
>> * implementation the Query is 'batched' to avoid reading in all
nodes
>> in memory, and the batches
>> * are removed from the query-caches.
>> *
>> * @author Michiel Meeuwissen
>> * @version $Id: HugeNodeListIterator.java,v 1.2 2004/04/05 19:04:12
>> michiel Exp $
>> * @since MMBase-1.8
>> */
>>
>> public class HugeNodeListIterator implements NodeIterator {
>>
>> // will not work through RMMCI, because caches are accessed.
>> protected static MultilevelCache multilevelCache =
>> MultilevelCache.getCache(); protected static NodeListCache
>> nodeListCache = NodeListCache.getCache();
>>
>> public static final int DEFAULT_BATCH_SIZE = 10000;
>>
>> protected NodeIterator nodeIterator;
>> protected Node nextNode;
>> protected Node previousNode;
>>
>> protected Query originalQuery;
>> protected int batchSize = DEFAULT_BATCH_SIZE;
>>
>> protected int nextIndex = 0;
>>
>>
>> /**
>> * Constructor for this Iterator.
>> *
>> * @param query The query which is used as a base for the
>> querie(s) to be executed.
>> * @param batchSize The (approximate) size of the sub-queries,
>> should be a reasonably large
>> * number, like 10000 or so.
>> */
>> public HugeNodeListIterator(Query query, int batchSize) {
>> this.batchSize = batchSize;
>> init(query);
>> }
>>
>> /**
>> * Constructor for this Iterator. The 'batchSize' is taken from
>> the query's 'maxnumber'
>> * properties, or, it that is not set, it is defaulted to 10000.
>> *
>> * @param query The query which is used as a base for the
>> querie(s) to be executed.
>> */
>> public HugeNodeListIterator(Query query) {
>> if (query.getMaxNumber() != SearchQuery.DEFAULT_MAX_NUMBER) {
>> batchSize = query.getMaxNumber();
>> } // else leave on default;
>> init(query);
>> }
>>
>> /**
>> * Called by constructors only
>> */
>> private void init(Query query) {
>> if (query.getOffset() > 0) {
>> throw new UnsupportedOperationException("Not implemented
>> for queries with offset");
>> }
>> Queries.sortUniquely(query);
>> originalQuery = query;
>> executeNextQuery((Query) originalQuery.clone());
>> }
>>
>> /**
>> * Executes the given query, taking into account the fact wether
>> it is NodeQuery or not, and
>> * applying the 'batchSize'. The result is available in the
>> 'nodeIterator' member.
>> */
>> protected void executeQuery(Query currentQuery) {
>> currentQuery.setMaxNumber(batchSize);
>> if (originalQuery instanceof NodeQuery) {
>> NodeQuery nq = (NodeQuery) currentQuery;
>> nodeIterator =
>> nq.getNodeManager().getList(nq).nodeIterator();
>> nodeListCache.remove(nq);
>> } else {
>> nodeIterator =
>> currentQuery.getCloud().getList(currentQuery).nodeIterator();
>> multilevelCache.remove(currentQuery);
>> }
>> }
>>
>> /** * Executes the given query, and prepares to do 'next',
so
>> setting 'nextNode' and 'previousNode'.
>> */
>> protected void executeNextQuery(Query q) {
>> executeQuery(q);
>> previousNode = nextNode;
>> if (nodeIterator.hasNext()) {
>> nextNode = nodeIterator.nextNode(); }
>> else {
>> nextNode = null;
>> }
>> }
>> /** * Executes the given query, and prepares to do
>> 'previous', so setting 'nextNode' and
>> * 'previousNode', and winds the new nodeIterator to the end.
>> */
>> protected void executePreviousQuery(Query q) {
>> executeQuery(q);
>> nextNode = previousNode;
>> while (nodeIterator.hasNext()) {
>> nodeIterator.next();
>> }
>> if (nodeIterator.hasPrevious()) {
>> previousNode = nodeIterator.nextNode();
>> } else {
>> previousNode = null;
>> }
>> }
>>
>>
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public boolean hasNext() {
>> return nextNode != null;
>> }
>>
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public boolean hasPrevious() {
>> return previousNode != null;
>> }
>>
>>
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public Object next() {
>> return nextNode();
>> }
>>
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public Object previous() {
>> return previous();
>> }
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public int previousIndex() {
>> return nextIndex - 1;
>> }
>> /**
>> * [EMAIL PROTECTED]
>> */
>> public int nextIndex() {
>> return nextIndex;
>> }
>>
>>
>> /**
>> * Used by nextNode and previousNode. Does a field-by-field
>> compare of two Node objects to check
>> * if they are equal. One would expect the equals-member function
>> of Node to be useable for
>> * this, but that seems not to be the case.
>> */
>> protected boolean equals(Node node1, Node node2) {
>> Iterator i = node1.getNodeManager().getFields().iterator();
>> while (i.hasNext()) {
>> Field f = (Field) i.next();
>> String name = f.getName();
>> if (! node1.getValue(name).equals(node2.getValue(name)))
{
>> return false;
>> } }
>> return true;
>>
>> }
>>
>> /**
>> * [EMAIL PROTECTED]
>> *
>> * Implementation calculates also the next next Node, and gives
>> back the 'old' next Node, from
>> * now on known as 'previousNode'.
>> */
>> public Node nextNode() {
>> if (nextNode != null) {
>> nextIndex++;
>> previousNode = nextNode;
>> if (nodeIterator.hasNext()) {
>> nextNode = nodeIterator.nextNode();
>> } else {
>> Query currentQuery = (Query) originalQuery.clone();
>>
>> // We don't use offset to determin the 'next' batch
of
>> query results
>> // because there could have been
deletions/insertions.
>> // We use the sort-order to apply a constraint.
>> SortOrder order = (SortOrder)
>> originalQuery.getSortOrders().get(0);
>> Constraint cons;
>> if (order.getDirection() ==
SortOrder.ORDER_ASCENDING) {
>> cons =
>> currentQuery.createConstraint(order.getField(),
>> FieldCompareConstraint.GREATER_EQUAL,
>> previousNode.getValue(order.getField().getFieldName()));
>> } else {
>> cons =
>> currentQuery.createConstraint(order.getField(),
>> FieldCompareConstraint.LESS_EQUAL,
>> previousNode.getValue(order.getField().getFieldName()));
>> }
>> Queries.addConstraint(currentQuery, cons);
>> executeNextQuery(currentQuery);
>> // perhaps the sort-order did not find a unique
>> result, skip some nodes in that case.
>> // XXX This wrong if (which is unlikely) there follow
>> more nodes than 'batchSize'.
>> while(nextNode != null && equals(nextNode,
>> previousNode)) {
>> if (nodeIterator.hasNext()) {
>> nextNode = nodeIterator.nextNode();
>> } else {
>> nextNode = null;
>> }
>> }
>> }
>>
>> return previousNode; // looks odd, but really is wat is
>> meant.
>> } else {
>> throw new NoSuchElementException("No next element");
>> }
>> }
>>
>> /**
>> * [EMAIL PROTECTED]
>> *
>> * Implementation is analogous to nextNode.
>> */
>> public Node previousNode() {
>> if (previousNode != null) {
>> nextNode = previousNode;
>> nextIndex --;
>> if (nodeIterator.hasPrevious()) {
>> previousNode = nodeIterator.previousNode();
>> } else {
>> Query currentQuery = (Query) originalQuery.clone();
>> SortOrder order = (SortOrder)
>> originalQuery.getSortOrders().get(0);
>> Constraint cons;
>> if (order.getDirection() ==
SortOrder.ORDER_ASCENDING) {
>> cons =
>> currentQuery.createConstraint(order.getField(),
>> FieldCompareConstraint.LESS_EQUAL,
>> nextNode.getValue(order.getField().getFieldName()));
>> } else {
>> cons =
>> currentQuery.createConstraint(order.getField(),
>> FieldCompareConstraint.GREATER_EQUAL,
>> nextNode.getValue(order.getField().getFieldName()));
>> }
>> Queries.addConstraint(currentQuery, cons);
>> executePreviousQuery(currentQuery);
>> while(previousNode != null && equals(nextNode,
>> previousNode)) {
>> if (nodeIterator.hasPrevious()) {
>> previousNode = nodeIterator.previousNode();
>> } else {
>> previousNode = null;
>> }
>> }
>> }
>> return nextNode;
>> } else {
>> throw new NoSuchElementException("No previous element");
>> }
>> }
>>
>>
>> /**
>> * @throws UnsupportedOperationException */
>> public void remove() {
>> throw new UnsupportedOperationException("Optional operation
>> 'remove' not implemented");
>> }
>>
>> /**
>> * @throws UnsupportedOperationException */
>> public void add(Object o) {
>> throw new UnsupportedOperationException("Optional operation
>> 'add' not implemented");
>> }
>>
>> /**
>> * @throws UnsupportedOperationException */
>> public void set(Object o) {
>> throw new UnsupportedOperationException("Optional operation
>> 'set' not implemented");
>> }
>>
>> /**
>> * Main only for testing.
>> */
>> public static void main(String[] args) {
>> Cloud cloud =
>> ContextProvider.getDefaultCloudContext().getCloud("mmbase");
>> NodeQuery q = cloud.getNodeManager("object").createQuery();
>> HugeNodeListIterator nodeIterator = new
HugeNodeListIterator(q);
>> while (nodeIterator.hasNext()) {
>> Node node = nodeIterator.nextNode();
>> System.out.println(node.getFunctionValue("gui",
>> null).toString());
>> }
>>
>> }
>>
>> }
>
>
>
--
Pierre van Rooden
Mediapark, C 107 tel. +31 (0)35 6772815
"Never summon anything bigger than your head."
<<winmail.dat>>
