On Dec 2, 2013, at 15:16 , Scott Ribe <scott_r...@elevated-dev.com> wrote:

> On Dec 2, 2013, at 7:57 AM, Marcel Weiher <marcel.wei...@gmail.com> wrote:
> 
>> Then you can twiddle the hash to get you a good compromise of speed vs. 
>> collisions.
> 
> You want to optimize the hash further? Only hash the first 1MB.

Yup, that’s one of the things I meant with twiddling the hash, could probably 
use even less than 1MB, maybe use more image metadata, thumbnail, ….

> One note: we're all saying that identity checks will be rare, so the 
> amortized cost is very low. Well. That's true if the user is actually adding 
> different files. The amortized cost is not so low if you have users who keep 
> adding the same files over & over ;-)


Very true.  My mental model was that duplicates would get rejected on addition, 
or at least be discarded once detected, which would mean that you do the 
comparison only once for actual duplicates. That may be inaccurate, but if it’s 
true than the comparison is not that much more expensive than a hash that 
hashes over the entire data (could actually be cheaper if the hash function is 
expensive to compute).

Marcel


_______________________________________________

Cocoa-dev mailing list (Cocoa-dev@lists.apple.com)

Please do not post admin requests or moderator comments to the list.
Contact the moderators at cocoa-dev-admins(at)lists.apple.com

Help/Unsubscribe/Update your Subscription:
https://lists.apple.com/mailman/options/cocoa-dev/archive%40mail-archive.com

This email sent to arch...@mail-archive.com

Reply via email to