I was using the 0.2 version.   After Milan Bouchet-Valat responded that it
is fixed by using '==' in master, I am now using master.  I just quoted
that example to illustrate the kind of issues, unexpected results for
comparisons produce.

I'm sorry if my previous email implied so, but I do not insist that we use
the approach to hashing that I proposed earlier.  I just want the semantics
of the numeric types to be congruent everywhere.

How about this?
1. If the user provides a hashing function, we will default to that.
2.Else if the parent type is any of the numeric types
           a.  If a data type is a subtype of Integer and is bitstype, we
will consider it to be an int and cast it to the appropriate integer
subtype of the same size or larger.
           b.  Similarly if a data type is a subtype of Float and is
bitstype, we will consider it to be an int and cast it to the appropriate
integer subtype of the same size or larger.
           c.  If it is a numeric type but composite, we could use a tuple
of the fields as input for hashing.
 3. But if the user does not provide the supertype as one of the Numeric
types, then we will not be able to second guess his/her intentions and we
will default to id.

This assumes that we can infer if a datatype is bitstype and also that for
a given type, we can introspect all it's fields.

The ideas that I have put up in this mail thread are based on Python's
hashing implementation. I'm sure there are other solutions out there which
might better suit our situation. I would love to hear about them.


On Wed, Jan 22, 2014 at 11:28 PM, Stefan Karpinski <[email protected]>wrote:

> What version of Julia are you using? The `in` operator doesn't use isequal
> anymore – it uses `==`. I agree that isequal hashing is not super
> intuitive, but I disagree that the approach to hashing that you're using
> here is viable, however. We need something that scales and generalizes
> better *and* is more intuitive.
>
>
> On Wed, Jan 22, 2014 at 12:36 PM, Sharmila Gopirajan Sivakumar <
> [email protected]> wrote:
>
>> Hi Stefan,
>>            I beg to differ. Julia's current collection of numeric types
>> will meet the needs of almost all users.  Users will mostly be defining
>> composite types.  In the rare case that they are defining a bitstype, its
>> usage semantics would most certainly deviate from the builtin numeric types
>> that it might not be equivalent to the numeric types.A typical example
>> would be Char and Int32.  When the user adds  a new type, he has the
>> freedom to specify how his type should be treated by adding a new method to
>> 'hash' function's multiple dispatch with the user-defined type as
>> parameter.  We could document the procedure to help the user define how his
>> user-defined type should be hashed.
>>
>>            To treat the numeric types differently during hashing would be
>> inconsistent with how the rest of the built-in operations deal with numeric
>> types.  This will increase the mental burden for the user to remember that
>> only in Dict, same values does not mean same keys.  This will be common
>> pitfall for most Julia users and we will have to spend more time educating
>> how Dict works, that we would have to spend time specifying how to hash
>> user-defined types.  Also the user, once educated, will have to assiduously
>> ensure all accesses of the Dict uses the same datatype.
>>
>>            As an example, I was bit by a similar bug a day back.  The
>> variables defined in my julia program defaulted to Int64.  Unfortunately
>> one of the C api database calls returned Int32. I was comparing the result
>> with a list of constants using the 'in' operator. Since the 'in' operator
>> internally used isequal, these two were considered different, though they
>> were same by value and raised an error where there was none.  This defeats
>> the purpose of type conversions and promotions.  We will face similar
>> issues in Dict also.
>>
>> Does the 'Set' collections use hash too?
>>
>> Regards,
>> Sharmi
>>
>>
>> On Tuesday, January 21, 2014 11:54:12 PM UTC+5:30, Stefan Karpinski wrote:
>>
>>> This is very similar to how we used to do hashing. It would be fine if
>>> there were a fixed collection of numeric types in Julia, but if course
>>> that's not the case and user-defined types need to be able to participate
>>> in the hashing behavior, which rapidly spirals out of control. That's what
>>> motivated the change to the current behavior, which unfortunately leaves a
>>> rather large gap in functionality since there's no good way to express
>>> equality comparison that doesn't care about type but considers NaNs to be
>>> equal values – which happens to be what I think hashing should probably do.
>>>
>>> On Jan 21, 2014, at 12:53 PM, Sharmila Gopirajan <[email protected]>
>>> wrote:
>>>
>>> Thanks for the heads up.  I will use the master then. I am still
>>> interested in implementing the hashing strategy for numbers.  So any
>>> feedback would be great.
>>>
>>> Regards,
>>> Sharmi
>>>
>>>
>>> On Tue, Jan 21, 2014 at 10:53 PM, Milan Bouchet-Valat 
>>> <[email protected]>wrote:
>>>
>>>>  Le mardi 21 janvier 2014 à 00:13 -0500, Jeff Bezanson a écrit :
>>>>
>>>> The main reason is that there are many types of numbers, with more
>>>> added all the time. And for purposes of hash tables, it is difficult
>>>> to ensure that all numerically-equal numbers hash the same. So we had
>>>> isequal(), which is used by dictionaries, distinguish numbers of
>>>> different types. At this point, we would kind of like to change this
>>>> back and make isequal more liberal (although it would still
>>>> distinguish -0.0 and 0.0, and so not be strictly more liberal than
>>>> ==). However, the hashing problem remains. Any ideas are welcome.
>>>>
>>>>  Actually, you changed the behavior of in to use == instead of
>>>> isequal() after I filed an issue: https://github.com/JuliaLang/
>>>> julia/issues/4941
>>>>
>>>>
>>>> With git master as of a few days, this works:
>>>>
>>>> julia> x = int32(4)
>>>> 4
>>>>
>>>> julia> y = int64(4)
>>>> 4
>>>>
>>>> julia> x == y
>>>> true
>>>>
>>>> julia> x in [y]
>>>> true
>>>>
>>>> That doesn't mean hashing shouldn't be improved, though.
>>>>
>>>>
>>>> Regards
>>>>
>>>
>>>
>

Reply via email to