Re: [SlimDevices: Beta] Need help to verify duplicate detection

erland Sat, 11 Sep 2010 07:52:04 -0700

Philip Meyer;575815 Wrote: 
> 
> Is the length item the length of the music content (not file length,
> which would include metadata blocks that could change)?  What accuracy
> does the length go to - nearest second, or more accurate?
> 
It's they number of compressed bytes in the audio segment, so it's very
accurate. Tags are not included in the length, it's just the audio
data.


Philip Meyer;575815 Wrote: 
> 
> I'm not sure that the premise that "the first n bytes of a song and the
> length is unique" is good enough.  Perhaps take n bytes at start and n
> bytes at end (but more of a performance penalty for seeking in each
> file).
> 
The latest Audio::Scan version(in 7.6), takes n bytes from the middle.
The idea is that this should get rid of the problem with silent
beginnings and the padding that happens with LAME encoded MP3 files. 

There is a small risk that there are silent parts in the middle,
especially in "hidden" tracks that exists on some albums which consists
of music+long silent part+"hidden" music. However, we should also be
aware of that we are talking about compressed bytes, so the compression
format might affect how good it works. This is also the reason why we
need testing with real music files in large libraries to ensure it only
detects real duplicates.

Philip Meyer;575815 Wrote: 
> 
> Yes, only 14.  But, depending on the logic that this duplicate checking
> may be used for, any false positive duplicate detections may be
> problematic.
> 
We need to get down to zero false positives, in the worst case we need
to calculate the checksum based on all audio bytes in the file. This
actually already happens with FLAC files with the latest Audio::Scan
version in 7.6 as it takes the MD5 built into the FLAC format instead
of calculating it.

Philip Meyer;575815 Wrote: 
> 
> 
> >> Increasing rescan time by 2 hours is a bit harsh too.
> >> 
> >My experience is that it's A LOT faster in 7.6 with the SQLite
> >database.
> >
> I can't see why DB engine would make any difference here.  If it is,
> there's something seriously wrong.  Surely the time is in file I/O seek
> time and processing to calculate the checksum.  A lookup on a checksum
> column with a suitable index is negligible in relation to that?
> 
I agree, but it's definitely faster with 7.6 in my setup. It could be
caused by changes in the background scheduler or by the SQLite database
being faster at inserting data due to different transactional behavior
or something similar. It only does INSERT's during the detection using
generated primary keys, and it drops all indexes before the detection
begins and re-apply them at the end.

I suspect it either has to do with the scheduler or with different
transactional behavior in the databases, but I haven't investigated it
in detail.


-- 
erland

Erland Isaksson ('My homepage' (http://erland.isaksson.info))
(Developer of 'many plugins/applets'
(http://wiki.slimdevices.com/index.php/User:Erland). If my answer
helped you and you like to encourage future presence on this forum
and/or third party plugin/applet development, 'donations are always
appreciated' (http://erland.isaksson.info/donate))
------------------------------------------------------------------------
erland's Profile: http://forums.slimdevices.com/member.php?userid=3124
View this thread: http://forums.slimdevices.com/showthread.php?t=81679

_______________________________________________
beta mailing list
[email protected]
http://lists.slimdevices.com/mailman/listinfo/beta

Re: [SlimDevices: Beta] Need help to verify duplicate detection

Reply via email to