[
https://issues.apache.org/jira/browse/HADOOP-5727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12703685#action_12703685
]
Shevek commented on HADOOP-5727:
--------------------------------
bq. This is a good point. It is not necessary to create an object. However,
returning the id directly might not lead to a good hash code. We may have to
add a hash code implementation.
This discussion was rampant in the very early days of Java: Consecutive
objects, including String, etc, have always had consecutive hash codes. In
fact, it does not really matter because:
* HashMap is a list-hash, so consecutive hashes followed by a hash collision
does not cause a walk of a long linear hash chain. The length of the chain
walked because of a collision will only be 2.
* HashMap implements a supplementary transformation on hash codes (one of the
Mersenne Twisters?), so it is not necessary to ensure distribution or
uniformity of the basic hash codes. In fact, HashMap probably does better than
the application code would do.
* Other Map strategies, such as R-B tree, do not use hashCode().
* In general, the issue of consecutive objects, especially numbers, generating
consecutive hash codes is accepted and understood by library authors who
require more uniform distribution of hash codes, and account is taken at that
point.
* The J2SE internally uses this strategy, and they spend a lot more time
thinking about these problems than we do.
I therefore submit that this patch should be applied as-is.
> Faster, simpler id.hashCode() which does not allocate memory
> ------------------------------------------------------------
>
> Key: HADOOP-5727
> URL: https://issues.apache.org/jira/browse/HADOOP-5727
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Shevek
> Attachments: 00_id-noallocate.patch, 03_id-noallocate.patch
>
>
> Integer.valueOf allocates memory if the integer is not in the object-cache,
> which is the vast majority of cases for the task id. It is possible to
> compute the hash code of an integer without going via the integer cache, and
> hence avoiding allocating memory.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.