[ 
https://issues.apache.org/jira/browse/HADOOP-6904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12978922#action_12978922
 ] 

Doug Cutting commented on HADOOP-6904:
--------------------------------------

About hash sizes: Java hashcodes are designed for hash tables, a 
collision-friendly purpose.  This is not a collision-friendly purpose.  A 
collision in protocols hashes would cause confusing failures.

Looking at:

  http://en.wikipedia.org/wiki/Birthday_problem#Probability_table

With 32-bit hashses, if there are 93 versions of a protocol, then there's a 1 
in a million chance (p=10^-6) of a collision, assuming a very good hash 
function.  Those are probably acceptable odds: a protocol might have 93 
variations, but there probably won't be anywhere near a million protocols.  If 
we used 64-bit hashes, then we could have millions of versions of millions of 
protocols before there's a decent chance of a collision.  That's a lot better.  
In Avro we use 128-bit MD5 hashes to identify protocols, which is probably 
overkil.  So I'm okay with 32-bit hashes here, but I'd be happier with 64.  A 
64-bit string hash is easy to write:

  
http://stackoverflow.com/questions/1660501/what-is-a-good-64bit-hash-function-in-java-for-textual-strings


> A baby step towards inter-version RPC communications
> ----------------------------------------------------
>
>                 Key: HADOOP-6904
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6904
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: ipc
>    Affects Versions: 0.22.0
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.22.0
>
>         Attachments: majorMinorVersion.patch, majorMinorVersion1.patch, 
> rpcCompatible-trunk.patch, rpcCompatible-trunk1.patch, 
> rpcCompatible-trunk2.patch, rpcVersion.patch, rpcVersion1.patch
>
>
> Currently RPC communications in Hadoop is very strict. If a client has a 
> different version from that of the server, a VersionMismatched exception is 
> thrown and the client can not connect to the server. This force us to update 
> both client and server all at once if a RPC protocol is changed. But sometime 
> different versions do not mean the client & server are not compatible. It 
> would be nice if we could relax this restriction and allows us to support 
> inter-version communications.
> My idea is that DfsClient catches VersionMismatched exception when it 
> connects to NameNode. It then checks if the client & the server is 
> compatible. If yes, it sets the NameNode version in the dfs client and allows 
> the client to continue talking to NameNode. Otherwise, rethrow the 
> VersionMismatch exception.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to