Philip Meyer;575766 Wrote: 
> >Which SBS version and revision are you using ?
> 7.5.2 SVN 31317
> 
The new Audio::Scan module is only available in 7.6, so the only
benefit compared to the previous Duplicate Detector version in your
case is that cue sheets should now mostly be detected as "Incorrect
Duplicates" instead of "Duplicates". 

Philip Meyer Wrote: 
> 
> I am a bit concerned about the random nature of finding false positive
> duplicates (where checksum and length exactly match, but the songs are
> really different). eg. I could potentially reduce 5000 bytes to 4999
> bytes, and perhaps no duplicates would be found, or increase the
> checksum byte range and end up with more hits.
> 
Is it correct to assume that most "Incorrect duplicates" either are:
- MP3 files encoded with LAME
or
- Music that have silence during the first seconds

Both these cases should be solved in the new Audio::Scan module in 7.6

Philip Meyer Wrote: 
> 
> I think if there is going to be some md5 checksum calculation to
> reconnect rescanned files to persistent data, there needs to be some
> additional checking performed to eliminate more false positives. eg.
> check file creation timestamp?
> 
It should just be a matter of which parts of the file you include. It
got a lot better in my setup by just appending the length of the audio
segment of the file to the checksum, this is the files listed as
"Incorrect duplicates".

So the wording is a bit confusing at the moment in the result shown by
the plugin. In your setup, it means that currently you only have 24
real duplicates and if I understand you correctly 10 of these were
actually duplicates so it's only 14 files that are incorrectly
identified as duplicates.

I would prefer to not use the file creation timestamp if I can avoid
it, I'm not sure I can rely on that it's never changed. I understand
that creation timestamp is different than modification timestamp but
for the use case I have in mind for this a file can never change its
identity. 

If you have the time and is willing to try the 7.6 nightly with
SQLite(or MySQL), I would appreciate to get the results from that to
see if it's able to identify the duplicates correctly.


Philip Meyer Wrote: 
> 
> Increasing rescan time by 2 hours is a bit harsh too.
> 
My experience is that it's A LOT faster in 7.6 with the SQLite
database. I haven't bothered to investigate why since this is just an
experimental plugin at the moment and SQLite is the database engine
that will be used future.

Also, the new Audio::Scan module will use the MD5 integrity checksum
inside of FLAC files, so for FLAC encoded music it should be really
fast.

As mentioned earlier, cue sheets will be detected as "Incorrect
duplicates" also in 7.6 because a bug in the latest Audio::Scan module.


-- 
erland

Erland Isaksson ('My homepage' (http://erland.isaksson.info))
(Developer of 'many plugins/applets'
(http://wiki.slimdevices.com/index.php/User:Erland). If my answer
helped you and you like to encourage future presence on this forum
and/or third party plugin/applet development, 'donations are always
appreciated' (http://erland.isaksson.info/donate))
------------------------------------------------------------------------
erland's Profile: http://forums.slimdevices.com/member.php?userid=3124
View this thread: http://forums.slimdevices.com/showthread.php?t=81679

_______________________________________________
beta mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/beta

Reply via email to