[
https://issues.apache.org/jira/browse/LUCENE-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yonik Seeley updated LUCENE-977:
--------------------------------
Attachment: hash.patch
Here is a patch that adds in 7 new bits (the rightmost bit is destroyed to make
the number odd) when calculating the incrementor via
final int inc = ((code>>8)+code)|1;
And thus gives 128 different possible paths to follow *per slot* on the first
collision on that slot.
Ideally we would shift log2(size_of_hashtable), but it's probably not worth
calculating that and I chose a small shift so it would work well for small
hashCodes (say from a very short string).
Given that equals() in these cases is probably pretty fast, the average speedup
is probably relatively minimal.
Comments?
> internal hashing improvements
> -----------------------------
>
> Key: LUCENE-977
> URL: https://issues.apache.org/jira/browse/LUCENE-977
> Project: Lucene - Java
> Issue Type: Improvement
> Reporter: Yonik Seeley
> Attachments: hash.patch
>
>
> Internal power-of-two closed hashtable traversal in DocumentsWriter and
> CharArraySet could be better.
> Here is the current method of resolving collisions:
> if (text2 != null && !equals(text, len, text2)) {
> final int inc = code*1347|1;
> do {
> code += inc;
> pos = code & mask;
> text2 = entries[pos];
> } while (text2 != null && !equals(text, len, text2));
> The problem is that two different hashCodes with the same lower bits will
> keep picking the same slots (the upper bits will be ignored).
> This is because multiplication (*1347) only really shifts bits to the left...
> so given that the two codes already matched on the right, they will both pick
> the same increment, and this will keep them on the same path through the
> table (even though it's being added to numbers that differ on the left). To
> resolve this, some bits need to be moved to the right when calculating the
> increment.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]