On Nov 29, 2010, at 8:50 PM, Element Green wrote:

On Thu, Nov 25, 2010 at 9:33 PM, robert bristow-johnson
<r...@audioimagination.com> wrote:

depending on how big your "window" is, i think a better term for this is *cross-correlation* not autocorrelation. it's a single stream of audio so in a sense of the word, it *is* autocorrelation, but what i normally think of, with that semantic is something where the lag is no bigger or not much bigger than the analysis window of either loop-end region of the audio and
the loop-begin.

if the loop points are separated by a much longer time (number of samples) than the size (in samples) of the two slices of audio being correlated, it's really cross-correlation. and you might find poor correlation given all lags that you're looking at. in fact, doing cross-correlation from one part of the tone or sound to another part that has a rapid change in amplitude envelope might fool your correlation into thinking there is a good match when there really isn't (because the amplitude is increasing, then the cross-correlation increases, but not necessarily because of a good match).

so, instead of either cross or autocorrelation, you might want to consider AMDF between the loop end and potential candidates to loop back to. instead
of looking for a maximum, you're looking for a minimum and a very low
minimum means a good match (or a bad match during a very low signal level).

Looking at the equation here for AMDF:
http://mi.eng.cam.ac.uk/~ajr/SpeechAnalysis/node72.html

It seems like the algorithm I came up with independently is very
similar.  The absolute value of the difference of the sample points is
taken as with AMDF.  Prior to summing the values together though, I'm
multiplying by the window I described before (with a peak in the
center where the loop point is), giving samples closer to the loop
point more weight.

it's AMDF (with a more window). that link above shows the AMDF with a rectangular window. usually "no window" means "rectangular window". and usually we think that rectangular ain't the best kind of window.

you can also square the difference (which is the same thing as squaring the abs value). i might call this the ASDF. a continuous- time representation (used for pitch detection) is depicted as Eq. (1) in the Wavetable-101.pdf paper you can find at the music-dsp site. in fact, you can raise the abs value of the difference to any power that's a positive number. this is really the Lp norm where AMDF with be the L1 norm and ASDF would be the L2 norm. the higher the power, the more it emphasizes the bigger errors (de-emphasizing the little errors).

one more thing about the L2 norm and ASDF is that it can be related directly to a form of autocorrelation. the ASDF is really an upside- down autocorrelation (with an offset). so the ASDF will have nulls or valleys precisely where the autocorrelation has peaks.

In practice this seems to work quite well and I'm going to leave it as
is for now.  It seems reasonably fast and straight forward.


find good loop points, then crossfade.

another thing about cross fading is that there is something you can do to adapt a little to better or poor loop points. if the loop points (and the window surrounding them) match well, then you're doing a crossfade between coherent audio and a constant voltage crossfade is indicated (when the crossfade is half done, both the fade out and fade in envelopes are at 50%). if the loop points are not well matched (but it's the best loop points your correlation function can find), then you want to do a crossfade that is
closer to a constant power crossfade where both fade in and fade out
envelopes are at 70.7% at the midpoint of the crossfade. there is a way to define the optimal crossfade function for any correlation between 0 (when it's like crossfading white noise to white noise) to 100% (like crossfading
a perfectly periodic waveform to a similarly appearing portion of the
waveform at loop start).

does any of this make any sense?


I'm not sure I'm following you.  From what I can understand it sounds
like you are saying that the degree to which the two loop point signal
windows match could be used to select different cross fade envelope
curves, for a better perceptual cross fade.

yes.  i am saying exactly that.

 I hadn't given this much
thought and just assumed a linear cross fade (0-100%) would be the way
to do it (that is from a limited DSP background mind you).  I am
intrigued by this idea though.  Any tips on how to generate the
envelope functions and what sort of equation could be used for
selecting the optimal envelope based on the signal correlation?

Olli has responded with the two particular end-points. if the two loop points are well correlated, you want to use the linear crossfading you planned (where the fade-in and fade-out functions always add to 1, that's what i call a "constant voltage" crossfade). but if the two loop points are completely uncorrelated (like you would get for white noise), then you want crossfade envelopes that are the sqrt() of your constant voltage crossfade (you want the square of your envelopes to add to 1, i call that a "constant power" crossfade).

you should always be able to find loop points that have correlation of at least 0 (completely uncorrelated). even if it's real crap that you are splicing to other real crap, assuming both signals have the DC removed, the correlation function will have no DC component and both negative values and positive values (with different lags). you will always want to choose a lag with the largest correlation (a normalized correlation as close to 1 as possible). so your correlation will be better than what you would get if it was completely uncorrelated white noise.

a few years ago i was investigating this and was considering writing a paper for the AES about it. there is a theory that you can design (on the fly) optimal crossfade envelopes for any normalized correlation between 0 and 1. if you want, i can dig up the notes and equations about it.

can i ask what the application is? (i may have missed it, but i'll look at earlier posts.) if it's looping for sound/instrument samples, this is an analysis thing that is not real-time and we can consider finding the best loop-begin points for a large variety of possible loop-end points. then pick the pair that looks best, given whatever your measure of good is. but in a (time-domain) real-time pitch shifter, having so many choices may not be available to you. you might find yourself in a situation where your loop-end is pretty well defined, you have to find a place to splice to and
take the best that you can get from that.


Its a sample/instrument editor, so its all non-realtime.

then what you should do is have your editor test a variety of loop-end times and for each of those, find the best loop-begin point (according to your AMDF or ASDF measure). so you would have a list of loop-end candidates, each with their optimal loop-begin point (where the begin point and end point match the best) and you would choose the loop-end candidate that has the best match to its associated loop-begin.

if this is for a simple sampler, and if your sampler can more simply "jump" from the loop-end to loop-begin point without the crossfade in real-time, then you can *pre*-crossfade the audio data so that the jump is as seamless as it would be if it were crossfaded. that's an old sampling keyboard trick (from the 1980s). it works very well for good matches (well correlated). not sure it would be very good for poor matches, but splices around poor matches sound like crap anyway.

--

r b-j                  r...@audioimagination.com

"Imagination is more important than knowledge."




--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to