On 7/4/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote:
Hi XiaoFeng,Ah, I may be explain myself very well here :) The bit shifting algorism is NOT used for randomization itself, but for trying to gather all infomation of 32 bits into log(2,buckectsize) bits, if the hashcode itself is not random enough, the indexing algorism will surely make a bad index. So I've made two hypothesis above. The more random the hashcode is, the better the shifting-algorism works. On the other hand, as the lower bits are important, if the last bit is wasted, it does harm to this shifting-algorism. At the first I only check IBMVME, finding it always ends with a even number, which means the the last bit of default hashcode of an object is always '0', as a result the last bit is useless for hashing. And by checking DRLVM, I find it similar, ends with 2 bits as '00', so we waste 2 bits here. Hashcode is used for hash, as a result, I believe obj_addr >> 2 will improve hashcode randomness at a very low cost. I was thinking if we take it in the classlib, however we can not tell if the hashcode is from obj_addr or user define. As a result, we'd better put it in VM. This cost is worth, for the more random the hoshcode is, the better hash-related classes performs (That's the reason why we discuss HashMap indexing algorism, this shifting algorism will properly consume more time than using hashcode alone in this single method, however it do improve a lot in the whole) Will it do some harm to other part of vm/classlib if we modify IdentityHashCode in VM (or kernel class java.lang.Object)? If not, I guess we'd better modify IdentityHashCode to improve performance.
Thanks for the explanation. No, I don't think this >>2 costs anything if implemented in VM. Probably you are right the >>2 helps to remove some useless info. Well, maybe I should read some random number theory, since I think although >>2 removes the meaningless bits in LSB, it introduces two meaningless bits in MSB. If you treat the 32 bit equally when encoding their info into log(num_bucket) bits, the bits in LSB and MSB give you the same effect. Or you assume the lower bits are more significant by default? Thanks, xiaofeng
2007/7/4, Xiao-Feng Li <[EMAIL PROTECTED]>: > On 7/4/07, Jimmy,Jing Lv <[EMAIL PROTECTED]> wrote: > > Hi All, > > <SNIP> > > > > To sum up, 1) I believe Sergey's idea do helps a lot in > > hash-related classes, which requires VM guys to do some modification > > in IdentityHashCode (a simple guess, object address >> 2 ?) > > Jimmy, do you suggest to use obj_addr >> 2 or a more complicated > hashing in VM? I am not sure if this >>2 really helps if the bucket > indexing does bit-shift or more randomization itself. > > (Note if we use a formula for hashing, it will be computed every time > it's accessed, we need consider the overhead of the computation. >>2 > is very cheap, but as said above, I doubt about its effectiveness.) > > Thanks, > xiaofeng > <SNIP> > > > > > -- > http://xiao-feng.blogspot.com > -- Best Regards! Jimmy, Jing Lv China Software Development Lab, IBM
-- http://xiao-feng.blogspot.com
