>One way to do this automatically would be to first look for duplicates with a 
>very small number of checksum bytes.  Use <= 4096 to avoid extra read 
>operations.  Then take any that are found to be duplicates and re-scan with a 
>larger md5_size value.  This can be repeated as many times as necessary.

That could be done on each rescan, to ensure that there are no false positive 
duplicates.  Would need to store the number of chunks used to make the checkum 
value, and would make comparison harder.

i.e. If this is going to be used to re-attach persistent record data for a file 
that has moved, can't just do a lookup on the checksum value.

And if adding new music that has a duplicate checksum for one chunk of data, 
how would it know if the file was a false positive match, or really the same 
file that has moved to a new location?
_______________________________________________
beta mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/beta

Reply via email to