D, I think that the current idea on the table is to do essentially nothing except carefully examine and possibly remove cases of tables or sets with Vector keys.
What specifically are you arguing against? On Fri, Feb 24, 2012 at 12:25 AM, Dmitriy Lyubimov <[email protected]>wrote: > Right. Ok. > > just in case, here's my updated list of cons ... > > 1) if people explicitly happen to write a.equals(b) where they just as > easily could've written a==b if they were ok with identity comparison, > it is pretty surely means they expect equals-by-value and not identity > comparison. > 2) same goes for HashMap/IdentityHashMap (with much less obviousness > though). > > 3) We probably don't want to prohibit the use of vectors as MR map > output keys (although i don't immediately see use cases for it), that > means we have to support comparable-by-value, which would be > contractually incoherent with equals-by-identity at the same time. > > 4) What Ted said about precaching hash codes for Strings I seem to be > able to easily convert into another cons: apparently best practice > suggests to go extra step to optimize hash coding rather than to drop > equals-by-value, hashCode-by-value. There are techniques to improve > that (e.g. same identity comparison can be had since if a==b then it > also true that a.equals(b) also true. Hashcoding can be indeed cached > (with caveat that since vector really is not immutable, then it needs > to maintain lazy cashing and cache cancellation simiar to > GregorianCalendar stuff, but that's probably a hassle). > > I guess we can look into some hash coding optimization if it seems > compelling that we do so. well i guess Sean is saying that too. > > On Thu, Feb 23, 2012 at 2:31 PM, Sean Owen <[email protected]> wrote: > > I think equals() and hashCode() ought to be retained at least for > > consistency. There is no sin in implementing these per se. If hashCode() > is > > too slow in some context then it can be cached if it is so important. > This > > is what String does. > > > > However I doubt these should be used as keys in general - that is the > issue > > to be fixed if anything if there is a performance problem. > > > > Do you want people to use equals()? Dunno it's up to the caller really. > > > > Sean > > On Feb 23, 2012 5:25 PM, "Jake Mannix" <[email protected]> wrote: > > > >> Hey Devs. > >> > >> Was prototyping some stuff in Mahout last night, and noticed something > >> I'm not sure if we've talked about before: because we have equals() for > >> Vector instances return true iff the numeric values of the vectors are > >> equal, and we also have a consistent hashCode(), anytime you have > >> HashMap<Vector, Anything>, all the typical things you think are O(1) are > >> really O(vector.numNonZeroes()). I tried to look through the codebase > and > >> see where we hang onto maps with vector keys, and we do it sometimes. > >> Maybe we shouldn't? Most Vectors have identities (clusterId, > documentId, > >> topicId, etc...) which we could normalize away... or maybe we should be > >> using IdentityHashMap, to ensure you're using strict object identity and > >> avoid doing this calculation? This could be really slow if these are > big > >> dense vectors, for instance. > >> > >> This looks like it could be a really easy place to accidentally add > heavy > >> complexity to things. Do we really want people do be checking > >> *mathematical* equals() on vectors which have floating point precision? > >> > >> -jake > >> >
