[ 
https://issues.apache.org/jira/browse/HADOOP-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703685#action_12703685
 ] 

Shevek commented on HADOOP-5727:
--------------------------------

bq. This is a good point. It is not necessary to create an object. However, 
returning the id directly might not lead to a good hash code. We may have to 
add a hash code implementation. 

This discussion was rampant in the very early days of Java: Consecutive 
objects, including String, etc, have always had consecutive hash codes. In 
fact, it does not really matter because:

* HashMap is a list-hash, so consecutive hashes followed by a hash collision 
does not cause a walk of a long linear hash chain. The length of the chain 
walked because of a collision will only be 2.
* HashMap implements a supplementary transformation on hash codes (one of the 
Mersenne Twisters?), so it is not necessary to ensure distribution or 
uniformity of the basic hash codes. In fact, HashMap probably does better than 
the application code would do.
* Other Map strategies, such as R-B tree, do not use hashCode().
* In general, the issue of consecutive objects, especially numbers, generating 
consecutive hash codes is accepted and understood by library authors who 
require more uniform distribution of hash codes, and account is taken at that 
point.
* The J2SE internally uses this strategy, and they spend a lot more time 
thinking about these problems than we do.

I therefore submit that this patch should be applied as-is.

> Faster, simpler id.hashCode() which does not allocate memory
> ------------------------------------------------------------
>
>                 Key: HADOOP-5727
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5727
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Shevek
>         Attachments: 00_id-noallocate.patch, 03_id-noallocate.patch
>
>
> Integer.valueOf allocates memory if the integer is not in the object-cache, 
> which is the vast majority of cases for the task id. It is possible to 
> compute the hash code of an integer without going via the integer cache, and 
> hence avoiding allocating memory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to