On Mon, Aug 3, 2009 at 8:52 PM, Grant Ingersoll<gsing...@apache.org> wrote:
> Why is long less flexible?  I mean, I get that Comparable is an interface
> and thus can be most anything, but really Taste just needs a way of
> identifying something uniquely right?  long satisfies that, no?

Really, it's that Strings are possible now too (and in theory other
stuff, but those would be by far the most common non-numeric type).
Yes, the framework doesn't care what it is. Right now I can have keys
like "A09BC3" and now this change would make that impossible. You'd
have to maintain, separately, a mapping between your keys and some
numeric identifier, if this were the case.

> I guess my question is mainly along the lines of how users interact with
> said id.  I would suspect it is then used as a key into a database or a map
> or something like that right?  Are they going to be then forced to
> constantly box it to Long?   I think it is reasonable to push those

(To be clear I'm suggesting long primitives, not Long objects -- the
point being to avoid the Object overhead entirely.)

I also perceive it's usually a *numeric* key which is why this could
make sense to assume.

> questions out to users to answer while focusing on being as lean as
> possible.  After all, Taste is a library, not an application, so in order
> for it to appeal to a broad set of users, it needs to be lean and fast and
> make as few assumption as possible.  Comparable is, in some sense, a bigger
> assumption than long. Perhaps that stuff can be layered on top while the
> core just uses long.

I suppose I view long as a stronger assumption -- more limiting --
since it assumes you use a numeric type for keys, as opposed to merely
assuming it is something with an ordering, which could be a String or
a... or a... well String is really the only other imaginable, common
use case. Before you could use a user name like 'srowen' as an ID and
now the assumption means you can't.

> Also, are there other places where memory could be saved first?

This is definitely next up on the list of memory consumers. Right now
roughly half the heap is storing arrays of Integers, in my particular
test case (but one that's pretty representative). And if those take 32
bytes compared to 8 bytes for a long (which has a satisfyingly larger
range to boot), looking at a 37.5% overall savings or so.

Then the overhead is really the 'indexes' like the specialized Maps
(which already do linear probing instead of separate chaining -- no
Map.Entry objects). Their storage would come down somewhat too.

I am also sure performance would increase -- avoiding millions and
millions of method calls for hashCode() and equals() and compareTo()
at least. Not to mention less GC pressure.

I don't know of other 'big wins'. Well, can think of some that are
specific to slope-one, but those are less interesting than the ones
that could affect many implementations -- those affecting the model.

Reply via email to