[ 
https://issues.apache.org/jira/browse/AVRO-1006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13193699#comment-13193699
 ] 

Raymie Stata commented on AVRO-1006:
------------------------------------

Sam writes:{quote}On the 0.001% collision rate, that seems high to me - would a 
128-bit hash be a better choice?{quote}

Thanks for pointing this out.  Turns out the 0.001% is a bug in the writeup, 
the actual probabilities are quite a bit lower: 3E-8 (0.000003%) for a 
million-item cache, 3E-10 for 100K items, and 3E-12 for 10K items (I'd love to 
have someone check my math).  Assuming an insertion per minute into a 
fixed-sized table (ie, random eviction), you'd expect a collision every year 
with the 1M item cache, every century with 100K items, and every millennia with 
10K items.  This seems acceptable, especially since I expect these caches to be 
closer to 10K items than a million (there's a bit of a discussion on this point 
in the updated writeup).  So are you happier now with 64 bits?

(The doc defines a canonical text for schemas, and fingerprints based that 
text.  The patch will contain a function for returning the canonical text.  
This approach implicitly standardizes how one would take an MD5-or SHA-xxx 
fingerprint of a schema, but perhaps I can be explicit on this point.)
                
> Fingerprints for Avro Schemas
> -----------------------------
>
>                 Key: AVRO-1006
>                 URL: https://issues.apache.org/jira/browse/AVRO-1006
>             Project: Avro
>          Issue Type: New Feature
>          Components: java
>            Reporter: Raymie Stata
>            Assignee: Raymie Stata
>              Labels: features
>         Attachments: schema-fingerprinting.html, schema-fingerprinting.html
>
>
> Add function that returns a standardized, 64-bit fingerprint for schemas.  
> Fingerprints are designed such that the chances of collisions is very, very 
> low.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to