Never mind. I found the checked in jar in the lib folder.

Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.


On Thu, Jun 6, 2013 at 3:13 AM, Robin Anil <[email protected]> wrote:

> Dawid, could you explain how this is accessing the mahout trunk code? Is
> that part of the package step?
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Wed, Jun 5, 2013 at 8:20 PM, Sean Owen <[email protected]> wrote:
>
>> On Wed, Jun 5, 2013 at 7:00 PM, Dawid Weiss
>> <[email protected]> wrote:
>> > 1) if you know your keys are 0...N-1 (dense) then there is typically no
>> > need to use a map; you'll do just as well with a regular lookup array :)
>>
>> Yeah of course... here I think there is no guarantee of this although
>> in many practical use cases the keys are drawn from a small range. For
>> example 1000 user IDs are often around the range 1 - 1000 or
>> something.
>>
>>
>> > Having written the above it made me wonder why Mahout's performance does
>> > not drop significantly on those malicious upper-bit changing
>> benchmarks, so
>> > I checked the code. The thing is Mahout (and I think Colt, originally)
>> uses
>> > double hashing with prime-sized tables (not 2^n-1 masking) -- this is
>> nice
>> > and indeed avoids the problems of clustering mentioned in pt. 2 above.
>>
>> Yes I thought this was why you could largely get away with it. If the
>> table is big enough, collisions and chains should not be a big deal to
>> begin with.
>>
>> Since you have this benchmark readily available -- what happens if you
>> un-comment one or both of the lines of code that mixes the hash? That
>> would really answer it.
>>
>> i'm curious as I have certainly made this assumption independently in
>> the custom hashtable implementations I made long ago (which also use
>> double hashing) and still use in the project. If it's not optimal
>> that's important info.
>>
>
>

Reply via email to