Dawid,

There is a model compromise out there: the Trove 'decorator' approach.
I'm perfectly happy to follow that model to give people whatever value
you can get from Java collection compatibility. I confess that I've
been considering using it as an excuse to learn the CGM library and
generate the code by magic, but I suspect that there will be a giant
retching noise at that suggestion.

Anyway, my initial goal is just to clean up and fill out the tight
APIs, and that's why you see such a paucity of compatibility up to
this point.

--benson



On Tue, Jan 12, 2010 at 7:19 AM, Dawid Weiss <dawid.we...@gmail.com> wrote:
> Hi guys,
>
> I see Benson working really hard on converting Colt primitive
> collections to Mahout -- this is great effort, really, since no such
> library currently exists with an Apache or BSD license.
>
> I wanted to ask you if compatibility with Java Collections is
> something you consider crucial for a set of collection classes. There
> are pros and cons of this compatibility.
>
> 1) compatibility with the Java Collections API gives you tons of other
> libraries (Google Collections, for example) which you can use out of
> the box with primitives,
>
> 2) compatibility with the Java Collections API means boxing/unboxing
> conversions on standard API calls and awkward other methods, should
> you wish to avoid such conversions,
>
> 3) collection classes have many strong contracts, including
> fast-failing iterators, etc. These are largerly unnecessary for
> computational code.
>
> 4) you may be fond of certain idioms you might have grown accustomed
> to (subList().clear()).
>
> 5) resigning from certain API elements can yield much faster code
> (resigning from bounds check, exposing the internal implementation of
> a given type for custom processing).
>
> I'm asking because we at Carrot Search have been working on a similar
> library for managing basic collection types for primitives (and
> generics), namely:
>
> - hash maps (open addressing),
> - sets (open addressing),
> - efficient bit set and bit operations (we imported stuff from Lucene),
> - stacks, dequeues, arrays.
>
> Our line of thinking eventually led us to create a library that is
> MOSTLY API-compatible, but NOT interface compatible. That is, for
> example, you get put(byte, int) methods on a hash map specialized for
> byte keys and int values, but this hash map does not implement
> Map<Byte, Integer>. It is therefore not a general purpose Java
> Collections replacement, but for computational code we found our
> implementation very efficient and straightforward.
>
> I have the code ready, tested and we'd be willing to contribute this
> entirely to the Apache foundation. The upside is that it's
> royalty-free (white box reimplementation of basic data structures).
> Some of the code was borrowed from Lucene (BitSet) and the method of
> open addressing is quadratic rather than double-hashing, which was
> inspired by the work done on Google sparse hashes.
>
>  I hope Benson won't feel offended -- he's done a great job working on
> Colt's code, but if you think the above assumptions are fine (the
> primary one being breaking the compatibility with Java Collections),
> then perhaps we should apply for a commons sub-project (we currently
> call this library "high performance primitive collections") and join
> the efforts under a single umbrella for everyone's benefit?
>
> Dawid
>

Reply via email to