I use 64-bit keys for vector-like data structures, and indeed you may pay a cost in extra RAM, but it has a lot of benefits in simplicity mostly, and making the probability of hash collisions ignorable even at huge scale. I think it's worthwhile overall.
On Wed, Jun 19, 2013 at 6:16 PM, Robin Anil <[email protected]> wrote: > <rant> > Which joker thought of removing uint from Java? > </rant> > > Dan, the cost of moving to 64 bit for the index is extra RAM usage. My > experiments show that 32 bits is enough to hash down billions of features. > Do we ever need such Quadrillions of features? Can Machine learning truly > work at that scale. Think about these. > > Robin Anil | Software Engineer | +1 312 869 2602 | Google Inc. > > > On Wed, Jun 19, 2013 at 5:16 AM, Dan Filimon > <[email protected]>wrote: > >> Also, this is particularly problematic because indices can't be negative so >> only 2^31 elements are actually possible. >> >> >> On Wed, Jun 19, 2013 at 1:15 PM, Dan Filimon <[email protected] >> >wrote: >> >> > Hi everyone! >> > >> > The current Vector API only supports 32bit maximum indices for Vectors. >> > >> > I feel that 64bits would be more appropriate especially because the >> > indices are likely to be hash values of other data and 32bit will result >> in >> > quite a few collisions. >> > >> > Also, for some jobs, notably ItemSimilarityJob, this restriction means >> > that we need a special id to index map where we'll collide anyway. >> > >> > What do you think about adding support for 64bit indices? >> > Is anyone at all interested? >> > >>
