On Fri, 19 Dec 2003, Knut Ploder wrote: > > Are there any implementations of the primitive collections? > I mean some class that implements e.g. > org.apache.commons.collections.primitives.LongCollection (other than the > abstract org.apache.commons.collections.primitives.AbstractLongCollection). > > I just cannot find one; I tried > * commons-primitives-1.0 > * collections-2.1 > but no luck.
There are two concrete implementations of LongCollection in commons-primitives-1.0 (also present but deprecated in the commons-collections HEAD), namely ArrayLongList (<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayLongList.html>) and ArrayUnsignedIntList (<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/ArrayUnsignedIntList.html>). More generally, see Array<Type>List for a concrete implementation of <Type>Collection. (<http://jakarta.apache.org/commons/primitives/apidocs/org/apache/commons/collections/primitives/package-summary.html>). (Actually, there's other concrete instances like CollectionLongCollection and ListLongList in the adapters package and UnmodifiableLongCollection and UnmodifiableLongList in the decorators package, but those need to be "seeded" with an instance of Collection/List or LongCollection/LongList, respectively.) > I found a IntHashMap in [lang], which uses > > > The reason for asking is, I have written a nice class LongSet, which > happens to implement > org.apache.commons.collections.primitives.LongCollection (all I had to do > is add 'implements > org.apache.commons.collections.primitives.LongCollection' :-) > > It is quite fast, and needs only little memory. > just in case my code is not a duplication (but, as I said in the prolog, I > couldn't find anything on jakarta) I would be happy to contribute my code. Currently there is no implementation of <Type>Set (or for that matter <Type>Map) in commons-primitives, or even a declaration of the <Type>Set or <Type>Map interface. I haven't had a chance to look at your implemenation in detail, but from your overview it sounds quite reasonable. > let me explain my approach: > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - > I'm using some hashtable (nothing new). > but for the collisions-chains, I do NOT instantiate Entry-objects; rather, > I allocate 2 arrays of length capacity, one for the keys, one for the > next-pointer, like: > > long[] keys = new long[capacity]; > int[] next = new int[capacity]; > > this way I save quite a lot of memory (I have the memory-overhead of 2 > array-objects, rather than the overhead for k objects of the form > class Entry { > long key; > Entry next; > } > which was my initial form of implementing the collision chain (well, I > started with java.util.LinkedList, which requires Long-objects rather than > long-primitives, requiring even more memory). > > > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - > the idea can easily be adapted to Maps (in fact I have done a > LongObjectMap). > and - of course - it can be used for any other primitive type (I chose long > because that's what i needed). > > and now the actual code: > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - > package com.knuti.util.nativecollections; > > //import java.util.Iterator; > import java.util.NoSuchElementException; > > import org.apache.commons.collections.primitives.AbstractLongCollection; > import org.apache.commons.collections.primitives.LongIterator; > > > /** > * A Set with elements of type <code>long</code>. > * > * It uses a hashtable, but not a standard version from > <code>java.util</code>. > * My hashtable differs in the implementation for the collision chains: > * <ul> > * <li>I use an array of long, plus an array of int, > * <li>rather than an ordinary LinkedList. > * </ul> > * This saves lots of memory (overhead of many small objects is replaced by > overhead of 2 array-objects), > * and causes little work for GC. > * > * @author Knut Ploder > * > * @version 1.1 adapt to jakarta collection primitives (replacing my - > conceptionally identical - LongIterator) > * > */ > public class LongSet extends AbstractLongCollection { > > > /** > * The load above which rehashing occurs. > * <br> > * Here, load means the ratio <tt>buckets/capacity</tt>. > * In other words, the (maximum) average length of a collision-chain, > * or the average number of items in a bucket of the hashtable. > * <br>note that the loadfactor may exceed 1.0 > */ > protected static final float DEFAULT_LOAD_FACTOR = 0.8f; > > > /** > * The default initial capacity for the hash table. > * <br> > * Here, capacity means the number of items the Set can hold. > * Rehashing occurs when the are more than capacity items in the Set, > * which is in contrast to many other implementaions, > * where rehashing occurs already at size > <tt>capacity*loadfactor</tt> items. > */ > protected static final int DEFAULT_INITIAL_CAPACITY = 11; > > > /** > * @see {link #DEFAULT_LOAD_FACTOR} > */ > protected float loadFactor; > > > /** > * The current number of items in the set. > * <br>Do not mix up with capacity. > */ > int _itemCtr; > > > /** > * the buckets of the hashtable. > * they hold an index into [EMAIL PROTECTED] #entries}, or -1 to indicate null > */ > private int[] buckets; > > > /** > * this is a replacement for the linked-lists > * @see #Entries > */ > private Entries entries; > > > /** > * Return the number of elements in the map. > */ > public final int size() { > return _itemCtr; > } > > > > // > // constructors > // > > > > /** > * Creates a new <code>LongSet</code> instance, > * with the default capacity > * and load factor. > */ > public LongSet() { > this(DEFAULT_INITIAL_CAPACITY, DEFAULT_LOAD_FACTOR); > } > > > /** > * Creates a new <code>LongSet</code> instance, > * with the default load factor, > * that can hold up to <tt>initialCapacity</tt> elements > * without triggering a rehash. > * > * @param initialCapacity an <code>int</code> value > */ > public LongSet(int initialCapacity) { > this(initialCapacity, DEFAULT_LOAD_FACTOR); > } > > > /** > * Creates a new <code>LongSet</code> instance, > * with load factor <tt>loadFactor</tt>, > * that can hold <tt>initialCapacity</tt> elements > * without triggering a rehash. > * > * @param initialCapacity an <code>int</code> value > * @param loadFactor a <code>float</code> value > */ > public LongSet(int initialCapacity, float loadFactor) { > super(); > this.loadFactor = loadFactor; > setUp(initialCapacity); > } > > > > // > // internal utility stuff > // > > > > /** > * Initializes the Set so that it can hold <tt>desiredCapacity</tt> > elements. > * <p> > * the number of buckets is <tt>desiredCapacity / loadfactor</tt>. > * <p>for example, > * a desiredCapacity = 1000 means you want a Set that can hold (up > to) 1000 entries without resize()ing. > * if the loadfactor is 80%, you need 1000 / 0.8 = 1250 buckets. > * > * @param desiredCapacity an <code>int</code> value > * > */ > protected void setUp(int desiredCapacity) { > int numbuckets; > int numitems; > > // the most senseful version. do what we are asked to do > numitems = desiredCapacity; > > // compute number of buckets > numbuckets = Math.round( desiredCapacity / loadFactor); > > // allocate buckets > buckets = new int[numbuckets]; > for( int i=0; i<buckets.length; i++) { > buckets[i]=-1; // the indexing-counterpart for null > } > > // pre-allocate memory for the maximum number of entries > entries = new Entries( numitems); > > // initialise counter, how many elements there are in the map. > this._itemCtr = 0; > > if(true) > System.out.println("" + getClass().getName() + ": alloc > desired for " > + desiredCapacity + " items. " > + "actually allocated " > + buckets.length + " buckets / " > + entries.entriesDotNext.length + " entries " > ); > } > > > /** > * previously, this was called the hash-function. > * but it is not; it just computes the bucket-index. > * <p> > * the hash-function would be a method of the key; for keys of type > long, > * which we do have here, the hashfunction simply is the long-value > itself. > */ > protected int bucketindex(long key) { > int h = (int)(key % buckets.length); > if(h < 0) { > h = -h; > } > return h; > } > > > /** > * resize to a new size. > * the new size is taken as is, without checking for primeness or > whatever. > * in case of bad input, some java runtimeexception may occur, > * e.g. NegativeArraySizeException > */ > protected void resize( int newcapacity) { > // save old stuff > Entries oldentries = this.entries; > int[] oldbuckets = buckets; > > System.out.println("resize: newcap = " + newcapacity); > > // allocate for new size > setUp(newcapacity); > > > // now, re-fill > for( int b=0; b<oldbuckets.length; b++) { > for(int entry = oldbuckets[b]; entry != -1; entry = > oldentries.getNext(entry) ) { > //trace("check index " + entry + ", in bucket " + > h); > long xkey = oldentries.getKey(entry); > add( xkey); > } > } > } > > > > /** > * @see > org.apache.commons.collections.primitives.LongCollection#contains(long) > */ > // > // public interface > // > > > > /** > * Check if the key is in the Set. > */ > public boolean contains(long key) { > int h = bucketindex(key); > for (int entry = buckets[h]; entry != -1; entry = > entries.getNext(entry) ) { > if (entries.getKey(entry) == key) { > return true; > } > } > return false; > } > > > /** > * @see > org.apache.commons.collections.primitives.LongCollection#add(long) > */ > public boolean add(long key) { > > // check if key is already in use > int h = bucketindex(key); > for (int entry = buckets[h]; entry != -1; entry = > entries.getNext(entry)) { > if (entries.getKey(entry) == key) { //if (entry.key > == key) { > return false; // not changed > } > } > > // resize if we're full > if(this.entries.freelist == -1) { > > //assert: number of items in map == arraysize in > entries-arrays > if( entries.entriesDotKey.length != _itemCtr) { > throw new IllegalStateException("assertion > violated: " > + "number of items in map (" + _itemCtr > + ")" > + "!=" > + "arraysize in entries-arrays: (" + > entries.entriesDotKey.length + ")" > ); > } > > int oldcapacity = this._itemCtr; > int newcapacity = 2*oldcapacity; > resize( newcapacity); > // h depends on hashsize, so we must re-compute it!!! > h = bucketindex(key); > } > > // add the new entry > this._itemCtr++; > int entry = entries.makeEntry(key); > entries.setNext( entry, buckets[h]); > buckets[h] = entry; > > return true; // Set has changed > } > > > /** > * Remove the specified key from the Set. > * > * @see > org.apache.commons.collections.primitives.LongCollection#removeElement(long) > */ > public boolean removeElement(long key) { > int h = bucketindex(key); > > int entry = buckets[h]; > int prev = -1; > while(entry != -1) { > if(entries.getKey(entry) != key) { > // not the key we want > prev = entry; > entry = entries.getNext(entry); > } else { > // found key > if (prev == -1) { > //first in list > buckets[h]=entries.getNext(entry); > } else { > entries.setNext(prev, > entries.getNext(entry)); > } > // put item in free-list > entries.delete(entry); > > _itemCtr--; > > return true; // set has changed > } > } > > // key not found -> set not changed > return false; > > } > > > /** > * provide an iterator over the set. > */ > public LongIterator iterator() { > LongIterator iter = new LocalIterator(); > return iter; > } > > > // > // informal public interface > // > > > /** > * print some statistics > */ > public void dumpStats() { > int used=0; > final int MAX=25; > int maxdep=0; > int maxdepctr=0; > int[] stat = new int[MAX+1]; > for( int ctr=0; ctr<buckets.length; ctr++) { > if(buckets[ctr] != -1) { > used++; > int len=0; > int next=buckets[ctr]; > while(next!=-1) { > len++; > next=entries.getNext(next); > } > if(len>maxdep) { > maxdep=len; > maxdepctr=1; > } else if(len==maxdep) { > maxdepctr++; > } > if(len<MAX) { > stat[len]++; > } else { > stat[MAX]++; > } > } > } > System.out.println("number of used buckets " + used); > for(int i=0; i<MAX; i++) { > if(stat[i]!=0) System.out.println("stat[" + i + "]=" + > stat[i]); > } > if(stat[MAX]!=0) System.out.println("stat[>" + MAX + "]=" + > stat[MAX]); > System.out.println("maxdep=" + maxdep + " (" + maxdepctr + " > times)"); > > //System.out.println("used bucket ratio " + > ((used*100.0)/capacity) + "%"); > > int capacity=entries.entriesDotNext.length; > System.out.println( "capacity " + capacity); > System.out.println( "number of buckets " + buckets.length); > System.out.println( "loadfactor (fact) " + ((capacity > *100.0)/buckets.length) + "%"); > System.out.println( "loadfactor (should)" + ( loadFactor > *100.0) + "%"); > System.out.println( "number of entries " + _itemCtr); > System.out.println( "actual loadfactor " + ((_itemCtr > *100.0)/capacity) + "%"); > //System.out.println("ratio " + ((size *100.0)/capacity) + > "%"); > System.out.println( "entries per bucket " + (_itemCtr * 1.0 / > buckets.length) + ""); > } > > > /** > * print contents, plus some statistics > */ > public void dump() { > System.out.println("" > + "contents of " > + getClass().getName() > //+ this > ); > //## > LongIterator iter = iterator(); > while(iter.hasNext()) { > System.out.println( "item: " + iter.next() ); > } > dumpStats(); > } > > > // > // > // > > > /** > * use arrays of the appropriate type for key, and next. > * <br>for key, use whatever > * <br>for next, rather than using Entry (a ref to another Entry), > * we use int as index to the 'virtual' Entry. > * -1 is the pendant for a null-Entry. > * <br> > * we cannot rely on GC, so we must also > * <li>have a free-list > * <li>have a method delete() that must be manually invoked when an > Entry is not needed any more > * <br>since Entry objects are only internal, this 'ugliness' is not > visible outside, > * and the memory- and performance increase is worth the 'why don't > you use Entry-objects'-code. > */ > private class Entries { > > private int freelist; > > private long[] entriesDotKey; > private int[] entriesDotNext; > > > /** > * Constructor for Entries. > */ > Entries(int capacity) { > entriesDotKey = new long[capacity]; > entriesDotNext = new int[capacity]; > > // init the free-list; it contains all items, > freelist=0; > for(int i=0; i<entriesDotNext.length; i++) { > entriesDotNext[i]=i+1; > } > entriesDotNext[entriesDotNext.length-1]=-1; > } > > > /** > * return a 'reference' to a new virtual Entry. > * <p> > * it is assumed that there is always some entry available. > * if not, resize() should have been invoked right before > invoking this method. > */ > int makeEntry(long key) { > if(freelist==-1) { > throw new IllegalStateException("free list is > empty"); > } > int i=freelist; > freelist = entriesDotNext[i]; > entriesDotNext[i] = -1; > entriesDotKey[i] = key; > > return i; // remember, the index represents the > Entry-reference > } > > > /** > * mark an entry unused. > * <li>wipe out key and value > * <li>add to freelist > */ > void delete( int index) { > entriesDotKey[index]=0; // wipe out key-value > entriesDotNext[index]=freelist; // add as new head of > free-list > freelist=index; > } > > // set- and get- methods for the virtual Entries > > long getKey(int index) { > return entriesDotKey[index]; > } > void setKey(int index, long key) { > entriesDotKey[index] = key; > } > > int getNext(int index) { > return entriesDotNext[index]; > } > void setNext(int index, int next) { > entriesDotNext[index] = next; > } > > } > > > > // > // iterator stuff > // > > > private class LocalIterator implements LongIterator { > > > //private LongMapNoentryclass map; > > > /** > * reference to the bucket holding the 'next' item. > * <br> > * It is the index into [EMAIL PROTECTED] #buckets}. > */ > private int next_bucketidx = 0; > > //private int current_bucketidx = 0; we would need it for a > high-performance remove() > > > /** > * reference to the item that [EMAIL PROTECTED] #next()} would return. > * <br> > * It is the index into [EMAIL PROTECTED] super#entries}. > */ > private int next_entryidx = -1; > > > /** > * reference to the current item (the one that [EMAIL PROTECTED] #remove > ()} would remove). > * <br> > * the difference to [EMAIL PROTECTED] #entryidx} is: > * <li>[EMAIL PROTECTED] #current_entryidx} is changed by invocations of > [EMAIL PROTECTED] #next()} only > * <li>[EMAIL PROTECTED] #next_entryidx} is changed by invocations of > [EMAIL PROTECTED] #next()} as well as [EMAIL PROTECTED] #hasNext()} > */ > private int current_entryidx = -1; > > > /** > * constructor. no need to pass the map, since we're an inner > class, > * and so we have access to our surrounding map anyway. > * > * the iterator is positioned 'before the first item'. > * this means: > * <li>must invoke hasNext() to handle the case of an empty map > * <li>must invoke next() at least once, before getKey() and > getValue() may be used. > */ > private LocalIterator( /* LongMapNoentryclass map */ ) { > //this.map = map; > advance(); > current_entryidx = -1; > } > > > /** > * @see com.knuti.util.nativecollections.LongIterator#hasNext() > */ > public boolean hasNext() { > return next_entryidx != -1; > } > > > /** > * internal method, advance to next element. > * on return, [EMAIL PROTECTED] #next_entryidx} 'references' the > next-item. > * if there is no more, [EMAIL PROTECTED] #next_entryidx} will be -1 > */ > protected void advance() { > // advance in bucket, if possible > if(next_entryidx!=-1) { > next_entryidx = entries.getNext(next_entryidx); > } > // find non-empty bucket (or, do nothing if bucket did > have next entry) > while( (next_entryidx == -1) && (next_bucketidx < > buckets.length) ) { > next_entryidx = buckets[next_bucketidx]; > next_bucketidx++; > } > } > > > /** > * same as in <code>java.util.Iterator#next()</code>, > * except for the return type. > * > * @see com.knuti.util.nativecollections.MapEntryIterator#next > () > * > */ > public long next() { > if(!hasNext()) { > throw new NoSuchElementException(); > } > current_entryidx = next_entryidx; // position the > 'finger' > advance(); // > pre-compute 'next' > long res = entries.getKey( current_entryidx); > return res; > } > > > /** > * Remove the current element. > * > * @see > com.knuti.util.nativecollections.MapEntryIterator#remove() > * > * @throws UnsupportedOperationException > * > */ > public void remove() { > //if(true) throw new UnsupportedOperationException(); > if (current_entryidx == -1) { > throw new IllegalStateException( "no current > element"); > } else { > long key=entries.getKey( current_entryidx); > LongSet.this.removeElement( key); > current_entryidx = -1; > } > } > > } > > > /** > * mini test, does nothing very useful > */ > public static void main(String[] args) { > > // initial capacity 17, > // allow up to 3,14 entries per bucket before re-hashing. > // i.e. start with a hashtable of size 17/3.14 = 5 > LongSet x = new LongSet(17,3.14f); > > // test 1: store a lot of entries. > // key: remainder mod 997 > for (int num = 12345; num < 123456; num+=31) { > long key=num%99997; > //long key=num%number; > //long key=num/number; > boolean changed=x.add( key); > if(changed) { > System.out.println( "new:" + key + " > <--------------------"); > } else { > System.out.println( "old:" + key + " (remove > now)"); > x.removeElement( key); > } > } > // show what we have > x.dump(); > } > } > - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - > - - - - - - - > > > knuti > (despite my email address, i am writing this as a private person) > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > -- - Rod <http://radio.weblogs.com/0122027/> --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
