On Tue, Mar 12, 2013 at 12:52 PM, Dawid Weiss <[email protected]>wrote:
> > Why would you say fastutil more than hppc? > > Oh, I like HPPC very much -- although I wrote it so I may not be > completely objective here :) > > And seriously I recommended fastutil because Mahout is primarily > computational so I reckon it would be nice to have a collections > package that would be (to the extent possible) compatible with > standard Java collections. So that, if you really wanted to, you could > compute something and get a map or a list of, say, primitive integers > but then pass it on to regular Guava collections filters, do something > there, write to disk using yet another package etc. In short: > interoperability. > The primary use case for mahout collections is directly *inside* of our Vector interface. Which is to say, it's not directly exposed to most users, and we don't really expose the ability to do guava collections stuff on them at all: We Do Math. :) So in particular, we don't expose the interface to the user, and really do want to have the inner loop power of doing fast higher performance stuff (or else why bother with a primitives library at all?) > The reason I wrote HPPC was primarily because (at the time) fastutil > was LGPL'd which was a showstopper for me. > But then Sebastiano changed the license; we exchanged a good few ideas > and this resulted in code-swapping so that now both libraries are (in > parts) very similar internally. Read: fastutil is fast, so is HPPC. I > currently use both depending on which one I feel is a better fit for a > particular project. HPPC is typically nicer if you want to go really > low-level or if you want to (and get used to) its different iterators/ > container structure, etc. Fastutil has more data structures to pick > from (but this comes at a price when you compare the JAR sizes). > Yeah, the jars get big in fastutil. We don't really need that many data structures, most of the time. Maps, lists... and even of those, we only use a few. > > I don't mind Mahout's own collection package either but I think it'd > be a waste of time to develop a completely identical version of HPPC > or fastutil (or trove, or... you name it). You guys are darn smart in > other areas and your time will be better spent on things folks like me > have a very vague idea of ;) Well, nobody's suggesting writing *another* primitives library. Just that we already rely on our own, and there are some things missing from it (iterators is all I can think of right now), and it needs some more extensive unit testing. Question is whether there's anything to be gained by just swapping our own collections *out* for something else, like HPPC or fastutil. > Dawid > > > > > Currently all we use in Mahout is lists and hashmaps, and we don't > > even currently have proper iteration over the latter, so we certainly > > don't depend on Collections compatibility... > > > > > > On Tue, Mar 12, 2013 at 12:03 PM, Dawid Weiss > > <[email protected]>wrote: > > > >> > Indeed. We have considered switching in the past, but the momentum > never > >> > developed. > >> > >> Exactly. Should somebody find the time to do the switch at some point > >> I'd say fastutil would be more appropriate for Mahout since it is > >> Collections-compatible and contains more different variants of common > >> data structures which may be handy in the future. > >> > >> Dawid > >> > > > > > > > > -- > > > > -jake > -- -jake
