>One way to do this automatically would be to first look for duplicates with a >very small number of checksum bytes. Use <= 4096 to avoid extra read >operations. Then take any that are found to be duplicates and re-scan with a >larger md5_size value. This can be repeated as many times as necessary.
That could be done on each rescan, to ensure that there are no false positive duplicates. Would need to store the number of chunks used to make the checkum value, and would make comparison harder. i.e. If this is going to be used to re-attach persistent record data for a file that has moved, can't just do a lookup on the checksum value. And if adding new music that has a duplicate checksum for one chunk of data, how would it know if the file was a false positive match, or really the same file that has moved to a new location? _______________________________________________ beta mailing list [email protected] http://lists.slimdevices.com/mailman/listinfo/beta
