Hi everyone, > Careful here - algorithms that spot almost-duplicates will happily > flag different shots from the same shoot. Definitely not something to > act upon without close human inspection.
I agree, and I wouldn't want to flag anything automatically based on our findings. The algorithm we use is meant to capture verbatim re-use, not derivative works. This means that it does a very poor job at matching images that are different photographic reproductions of the same work (light conditions, angles, borders, etc, will all differ). It does a fairly good job at matching images that are verbatim copies, allowing for resizing and format changes, but it's not perfect, and we definitely end up with the same hash for some images, even if they're not identical. This happens often with maps, for instance. For example two maps of US states, one marking Washington in red and one marking California in red. With no other differences, they'll end up hashed very close to each other. Sincerely, Jonas _______________________________________________ Commons-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/commons-l
