Hi James, > They're very similar, though the smaller image is in fact sharper, a little > darker, and slightly differently framed.
This wouldn't trigger any bells for us. They're too different for us to, mathematically, say that they are similar without also triggering a lot of false positives. If we look at the hashes generated by our blockhash[1] algorithm for those two images, we end up with this: 8000bc409f7c9ffd9cd096689fe883e4f3fd83c583c101e183e101e60073e7bf 80019c819ff99ff18cc1944197e19fe9f7e9c3c983c103c1a3e183ee217004ff You can see that there is some commonality, but that they're also quite far apart. If we convert this to bits and calculate the hamming distance (the number of bits that differ) between the two, we end up with a distance of 48 bits (out of 256). So far, we've found that a maximum distance of 10 is usually sufficiently unique to be called a match, though with the draft query for duplicate Commons worked that I linked to, I've been even more restrictive and not allowed even 1 bit to differ, just to get a better match for those that do match, at the expense of not matching as many. Sincerely, Jonas _______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
