I'm not amongst the knowledgeable but the interested ;) I also vote +1 for long, being Taste a helper for making a recommendation system, to me it's much more important gain in performance than in flexibility. Once the recommended items ID's are obtain (generally a few 5 or 10 items), everything could be done pretty quickly.
Regards, On Mon, Aug 3, 2009 at 5:22 PM, Sean Owen <sro...@gmail.com> wrote: > On Mon, Aug 3, 2009 at 8:52 PM, Grant Ingersoll<gsing...@apache.org> > wrote: > > Why is long less flexible? I mean, I get that Comparable is an interface > > and thus can be most anything, but really Taste just needs a way of > > identifying something uniquely right? long satisfies that, no? > > Really, it's that Strings are possible now too (and in theory other > stuff, but those would be by far the most common non-numeric type). > Yes, the framework doesn't care what it is. Right now I can have keys > like "A09BC3" and now this change would make that impossible. You'd > have to maintain, separately, a mapping between your keys and some > numeric identifier, if this were the case. > > > I guess my question is mainly along the lines of how users interact with > > said id. I would suspect it is then used as a key into a database or a > map > > or something like that right? Are they going to be then forced to > > constantly box it to Long? I think it is reasonable to push those > > (To be clear I'm suggesting long primitives, not Long objects -- the > point being to avoid the Object overhead entirely.) > > I also perceive it's usually a *numeric* key which is why this could > make sense to assume. > > > questions out to users to answer while focusing on being as lean as > > possible. After all, Taste is a library, not an application, so in order > > for it to appeal to a broad set of users, it needs to be lean and fast > and > > make as few assumption as possible. Comparable is, in some sense, a > bigger > > assumption than long. Perhaps that stuff can be layered on top while the > > core just uses long. > > I suppose I view long as a stronger assumption -- more limiting -- > since it assumes you use a numeric type for keys, as opposed to merely > assuming it is something with an ordering, which could be a String or > a... or a... well String is really the only other imaginable, common > use case. Before you could use a user name like 'srowen' as an ID and > now the assumption means you can't. > > > Also, are there other places where memory could be saved first? > > This is definitely next up on the list of memory consumers. Right now > roughly half the heap is storing arrays of Integers, in my particular > test case (but one that's pretty representative). And if those take 32 > bytes compared to 8 bytes for a long (which has a satisfyingly larger > range to boot), looking at a 37.5% overall savings or so. > > Then the overhead is really the 'indexes' like the specialized Maps > (which already do linear probing instead of separate chaining -- no > Map.Entry objects). Their storage would come down somewhat too. > > I am also sure performance would increase -- avoiding millions and > millions of method calls for hashCode() and equals() and compareTo() > at least. Not to mention less GC pressure. > > I don't know of other 'big wins'. Well, can think of some that are > specific to slope-one, but those are less interesting than the ones > that could affect many implementations -- those affecting the model. >