+1 Good point Jake. I don't know of any such situations personally but
its something to keep an eye out for. If you or anybody else see some
instances of this maybe you can put in a TODO:?
On 2/23/12 10:24 AM, Jake Mannix wrote:
Hey Devs.
Was prototyping some stuff in Mahout last night, and noticed something
I'm not sure if we've talked about before: because we have equals() for
Vector instances return true iff the numeric values of the vectors are
equal, and we also have a consistent hashCode(), anytime you have
HashMap<Vector, Anything>, all the typical things you think are O(1) are
really O(vector.numNonZeroes()). I tried to look through the codebase and
see where we hang onto maps with vector keys, and we do it sometimes.
Maybe we shouldn't? Most Vectors have identities (clusterId, documentId,
topicId, etc...) which we could normalize away... or maybe we should be
using IdentityHashMap, to ensure you're using strict object identity and
avoid doing this calculation? This could be really slow if these are big
dense vectors, for instance.
This looks like it could be a really easy place to accidentally add heavy
complexity to things. Do we really want people do be checking
*mathematical* equals() on vectors which have floating point precision?
-jake