Re: [classlib] Some further study on hashcode and HashMap(Was Re: Performance improvement of java.util.HashMap)

Xiao-Feng Li Wed, 04 Jul 2007 01:34:15 -0700

On 7/4/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote:

Hi XiaoFeng,


     Ah, I may be explain myself very well here :) The bit shifting
algorism is NOT used for randomization itself, but for trying to
gather all infomation of 32 bits into log(2,buckectsize) bits, if the
hashcode itself is not random enough, the indexing algorism will
surely make a bad index. So I've made two hypothesis above. The more
random the hashcode is, the better the shifting-algorism works. On the
other hand, as the lower bits are important, if the last bit is
wasted, it does harm to this shifting-algorism.
     At the first I only check IBMVME, finding it always ends with a
even number, which means the the last bit of default hashcode of an
object is always '0', as a result the last bit is useless for hashing.
And by checking DRLVM, I find it similar,  ends with 2 bits as '00',
so we waste 2 bits here.
     Hashcode is used for hash, as a result,  I believe obj_addr >> 2
will improve hashcode randomness at a very low cost. I was thinking if
we take it in the classlib, however we can not tell if the hashcode is
from obj_addr or user define. As a result, we'd better put it in VM.
This cost is worth, for the more random the hoshcode is, the better
hash-related classes performs (That's the reason why we discuss
HashMap indexing algorism, this shifting algorism will properly
consume more time than using hashcode alone in this single method,
however it do improve a lot in the whole)
     Will it do some harm to other part of vm/classlib if we modify
IdentityHashCode in VM (or kernel class java.lang.Object)?  If not, I
guess we'd better modify IdentityHashCode to improve performance.


Thanks for the explanation. No, I don't think this >>2 costs anything
if implemented in VM. Probably you are right the >>2 helps to remove
some useless info.

Well, maybe I should read some random number theory, since I think
although >>2  removes the meaningless bits in LSB, it introduces two
meaningless bits in MSB. If you treat the 32 bit equally when encoding
their info into log(num_bucket) bits, the bits in LSB and MSB give you
the same effect. Or you assume the lower bits are more significant by
default?

Thanks,
xiaofeng

2007/7/4, Xiao-Feng Li <[EMAIL PROTECTED]>:
> On 7/4/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote:
> > Hi All,
> >
<SNIP>
> >
> >     To sum up, 1) I believe Sergey's idea do helps a lot in
> > hash-related classes, which requires VM guys to do some modification
> > in IdentityHashCode (a simple guess, object address >> 2 ?)
>
> Jimmy, do you suggest to use obj_addr >> 2 or a more complicated
> hashing in VM? I am not sure if this >>2 really helps if the bucket
> indexing does bit-shift or more randomization itself.
>
> (Note if we use a formula for hashing, it will be computed every time
> it's accessed, we need consider the overhead of the computation. >>2
> is very cheap, but as said above, I doubt about its effectiveness.)
>
> Thanks,
> xiaofeng
>
<SNIP>
> >
>
>
> --
> http://xiao-feng.blogspot.com
>


--

Best Regards!

Jimmy, Jing Lv
China Software Development Lab, IBM



--
http://xiao-feng.blogspot.com

Re: [classlib] Some further study on hashcode and HashMap(Was Re: Performance improvement of java.util.HashMap)

Reply via email to