I use 64-bit keys for vector-like data structures, and indeed you may
pay a cost in extra RAM, but it has a lot of benefits in simplicity
mostly, and making the probability of hash collisions ignorable even
at huge scale. I think it's worthwhile overall.

On Wed, Jun 19, 2013 at 6:16 PM, Robin Anil <[email protected]> wrote:
> <rant>
> Which joker thought of removing uint from Java?
> </rant>
>
> Dan, the cost of moving to 64 bit for the index is extra RAM usage. My
> experiments show that 32 bits is enough to hash down billions of features.
> Do we ever need such Quadrillions of features? Can Machine learning truly
> work at that scale. Think about these.
>
> Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc.
>
>
> On Wed, Jun 19, 2013 at 5:16 AM, Dan Filimon 
> <[email protected]>wrote:
>
>> Also, this is particularly problematic because indices can't be negative so
>> only 2^31 elements are actually possible.
>>
>>
>> On Wed, Jun 19, 2013 at 1:15 PM, Dan Filimon <[email protected]
>> >wrote:
>>
>> > Hi everyone!
>> >
>> > The current Vector API only supports 32bit maximum indices for Vectors.
>> >
>> > I feel that 64bits would be more appropriate especially because the
>> > indices are likely to be hash values of other data and 32bit will result
>> in
>> > quite a few collisions.
>> >
>> > Also, for some jobs, notably ItemSimilarityJob, this restriction means
>> > that we need a special id to index map where we'll collide anyway.
>> >
>> > What do you think about adding support for 64bit indices?
>> > Is anyone at all interested?
>> >
>>

Reply via email to