On Dec 30, 2015 3:17 PM, "Dave Morriss" <[email protected]> wrote: > Using a recent HPR show in my podcast queue and running > echoprint-codegen on the entire thing I found I got a chunk of JSON with > metadata and a humongous fingerprint string.
After spending some time reading the sever-side code, I found out the "fingerprint" is an encoded (details in an upcoming episode) list of timestamped "onset events" from the audio, which is why the lengths are correlated. That list then has to be fuzzy (fuzzily?) matched against a candidate (essentially by counting how many events it has in common and whether they occur the same distance apart; again, more details to come). > Then I started wondering how much you'd need to chop off a new show > given that any intro might be in a multitude of formats and of a > variable length. The codegen tool uses ffmpeg, so it should support a lot of formats out of the box. And if we're only checking whether the very beginning of an upload matches the intro, selecting a good sample shouldn't be too hard. > Then I realised I was probably out of my depth. You and I both. Fortunately (unfortunately?) that hasn't stopped me yet. > I'll be fascinated to know how people cleverer than I am work this out, > and look forward to the show on it! You know what they say: give a man a hammer and he'll fish for nails, teach a man to code, and he'll waste hours using awk to analyse audio and misquoting proverbs.
_______________________________________________ Hpr mailing list [email protected] http://hackerpublicradio.org/mailman/listinfo/hpr_hackerpublicradio.org
