Dawid, There is a model compromise out there: the Trove 'decorator' approach. I'm perfectly happy to follow that model to give people whatever value you can get from Java collection compatibility. I confess that I've been considering using it as an excuse to learn the CGM library and generate the code by magic, but I suspect that there will be a giant retching noise at that suggestion.
Anyway, my initial goal is just to clean up and fill out the tight APIs, and that's why you see such a paucity of compatibility up to this point. --benson On Tue, Jan 12, 2010 at 7:19 AM, Dawid Weiss <dawid.we...@gmail.com> wrote: > Hi guys, > > I see Benson working really hard on converting Colt primitive > collections to Mahout -- this is great effort, really, since no such > library currently exists with an Apache or BSD license. > > I wanted to ask you if compatibility with Java Collections is > something you consider crucial for a set of collection classes. There > are pros and cons of this compatibility. > > 1) compatibility with the Java Collections API gives you tons of other > libraries (Google Collections, for example) which you can use out of > the box with primitives, > > 2) compatibility with the Java Collections API means boxing/unboxing > conversions on standard API calls and awkward other methods, should > you wish to avoid such conversions, > > 3) collection classes have many strong contracts, including > fast-failing iterators, etc. These are largerly unnecessary for > computational code. > > 4) you may be fond of certain idioms you might have grown accustomed > to (subList().clear()). > > 5) resigning from certain API elements can yield much faster code > (resigning from bounds check, exposing the internal implementation of > a given type for custom processing). > > I'm asking because we at Carrot Search have been working on a similar > library for managing basic collection types for primitives (and > generics), namely: > > - hash maps (open addressing), > - sets (open addressing), > - efficient bit set and bit operations (we imported stuff from Lucene), > - stacks, dequeues, arrays. > > Our line of thinking eventually led us to create a library that is > MOSTLY API-compatible, but NOT interface compatible. That is, for > example, you get put(byte, int) methods on a hash map specialized for > byte keys and int values, but this hash map does not implement > Map<Byte, Integer>. It is therefore not a general purpose Java > Collections replacement, but for computational code we found our > implementation very efficient and straightforward. > > I have the code ready, tested and we'd be willing to contribute this > entirely to the Apache foundation. The upside is that it's > royalty-free (white box reimplementation of basic data structures). > Some of the code was borrowed from Lucene (BitSet) and the method of > open addressing is quadratic rather than double-hashing, which was > inspired by the work done on Google sparse hashes. > > I hope Benson won't feel offended -- he's done a great job working on > Colt's code, but if you think the above assumptions are fine (the > primary one being breaking the compatibility with Java Collections), > then perhaps we should apply for a commons sub-project (we currently > call this library "high performance primitive collections") and join > the efforts under a single umbrella for everyone's benefit? > > Dawid >