On Tue, Mar 12, 2013 at 12:52 PM, Dawid Weiss
<[email protected]>wrote:

> > Why would you say fastutil more than hppc?
>
> Oh, I like HPPC very much -- although I wrote it so I may not be
> completely objective here :)
>
> And seriously I recommended fastutil because Mahout is primarily
> computational so I reckon it would be nice to have a collections
> package that would be (to the extent possible) compatible with
> standard Java collections. So that, if you really wanted to, you could
> compute something and get a map or a list of, say, primitive integers
> but then pass it on to regular Guava collections filters, do something
> there, write to disk using yet another package etc. In short:
> interoperability.
>

The primary use case for mahout collections is directly *inside* of
our Vector interface.  Which is to say, it's not directly exposed to
most users, and we don't really expose the ability to do guava collections
stuff on them at all: We Do Math. :)  So in particular, we don't expose
the interface to the user, and really do want to have the inner loop power
of doing fast higher performance stuff (or else why bother with a
primitives library at all?)


> The reason I wrote HPPC was primarily because (at the time) fastutil
> was LGPL'd which was a showstopper for me.
> But then Sebastiano changed the license; we exchanged a good few ideas
> and this resulted in code-swapping so that now both libraries are (in
> parts) very similar internally. Read: fastutil is fast, so is HPPC. I
> currently use both depending on which one I feel is a better fit for a
> particular project. HPPC is typically nicer if you want to go really
> low-level or if you want to (and get used to) its different iterators/
> container structure, etc. Fastutil has more data structures to pick
> from (but this comes at a price when you compare the JAR sizes).
>

Yeah, the jars get big in fastutil.  We don't really need that many data
structures, most of the time.  Maps, lists... and even of those, we only
use a few.


>
> I don't mind Mahout's own collection package either but I think it'd
> be a waste of time to develop a completely identical version of HPPC
> or fastutil (or trove, or... you name it). You guys are darn smart in
> other areas and your time will be better spent on things folks like me
> have a very vague idea of ;)


 Well, nobody's suggesting writing *another* primitives library.  Just
that we already rely on our own, and there are some things missing from
it (iterators is all I can think of right now), and it needs some more
extensive unit testing.

Question is whether there's anything to be gained by just swapping
our own collections *out* for something else, like HPPC or fastutil.



> Dawid
>
> >
> > Currently all we use in Mahout is lists and hashmaps, and we don't
> > even currently have proper iteration over the latter, so we certainly
> > don't depend on Collections compatibility...
> >
> >
> > On Tue, Mar 12, 2013 at 12:03 PM, Dawid Weiss
> > <[email protected]>wrote:
> >
> >> > Indeed.  We have considered switching in the past, but the momentum
> never
> >> > developed.
> >>
> >> Exactly. Should somebody find the time to do the switch at some point
> >> I'd say fastutil would be more appropriate for Mahout since it is
> >> Collections-compatible and contains more different variants of common
> >> data structures which may be handy in the future.
> >>
> >> Dawid
> >>
> >
> >
> >
> > --
> >
> >   -jake
>



-- 

  -jake

Reply via email to