What version of Julia are you using? The `in` operator doesn't use isequal anymore – it uses `==`. I agree that isequal hashing is not super intuitive, but I disagree that the approach to hashing that you're using here is viable, however. We need something that scales and generalizes better *and* is more intuitive.
On Wed, Jan 22, 2014 at 12:36 PM, Sharmila Gopirajan Sivakumar < [email protected]> wrote: > Hi Stefan, > I beg to differ. Julia's current collection of numeric types > will meet the needs of almost all users. Users will mostly be defining > composite types. In the rare case that they are defining a bitstype, its > usage semantics would most certainly deviate from the builtin numeric types > that it might not be equivalent to the numeric types.A typical example > would be Char and Int32. When the user adds a new type, he has the > freedom to specify how his type should be treated by adding a new method to > 'hash' function's multiple dispatch with the user-defined type as > parameter. We could document the procedure to help the user define how his > user-defined type should be hashed. > > To treat the numeric types differently during hashing would be > inconsistent with how the rest of the built-in operations deal with numeric > types. This will increase the mental burden for the user to remember that > only in Dict, same values does not mean same keys. This will be common > pitfall for most Julia users and we will have to spend more time educating > how Dict works, that we would have to spend time specifying how to hash > user-defined types. Also the user, once educated, will have to assiduously > ensure all accesses of the Dict uses the same datatype. > > As an example, I was bit by a similar bug a day back. The > variables defined in my julia program defaulted to Int64. Unfortunately > one of the C api database calls returned Int32. I was comparing the result > with a list of constants using the 'in' operator. Since the 'in' operator > internally used isequal, these two were considered different, though they > were same by value and raised an error where there was none. This defeats > the purpose of type conversions and promotions. We will face similar > issues in Dict also. > > Does the 'Set' collections use hash too? > > Regards, > Sharmi > > > On Tuesday, January 21, 2014 11:54:12 PM UTC+5:30, Stefan Karpinski wrote: > >> This is very similar to how we used to do hashing. It would be fine if >> there were a fixed collection of numeric types in Julia, but if course >> that's not the case and user-defined types need to be able to participate >> in the hashing behavior, which rapidly spirals out of control. That's what >> motivated the change to the current behavior, which unfortunately leaves a >> rather large gap in functionality since there's no good way to express >> equality comparison that doesn't care about type but considers NaNs to be >> equal values – which happens to be what I think hashing should probably do. >> >> On Jan 21, 2014, at 12:53 PM, Sharmila Gopirajan <[email protected]> >> wrote: >> >> Thanks for the heads up. I will use the master then. I am still >> interested in implementing the hashing strategy for numbers. So any >> feedback would be great. >> >> Regards, >> Sharmi >> >> >> On Tue, Jan 21, 2014 at 10:53 PM, Milan Bouchet-Valat <[email protected]>wrote: >> >>> Le mardi 21 janvier 2014 à 00:13 -0500, Jeff Bezanson a écrit : >>> >>> The main reason is that there are many types of numbers, with more >>> added all the time. And for purposes of hash tables, it is difficult >>> to ensure that all numerically-equal numbers hash the same. So we had >>> isequal(), which is used by dictionaries, distinguish numbers of >>> different types. At this point, we would kind of like to change this >>> back and make isequal more liberal (although it would still >>> distinguish -0.0 and 0.0, and so not be strictly more liberal than >>> ==). However, the hashing problem remains. Any ideas are welcome. >>> >>> Actually, you changed the behavior of in to use == instead of >>> isequal()after I filed an issue: >>> https://github.com/JuliaLang/julia/issues/4941 >>> >>> >>> With git master as of a few days, this works: >>> >>> julia> x = int32(4) >>> 4 >>> >>> julia> y = int64(4) >>> 4 >>> >>> julia> x == y >>> true >>> >>> julia> x in [y] >>> true >>> >>> That doesn't mean hashing shouldn't be improved, though. >>> >>> >>> Regards >>> >> >>
