On Nov 29, 2010, at 8:50 PM, Element Green wrote:
On Thu, Nov 25, 2010 at 9:33 PM, robert bristow-johnson
<r...@audioimagination.com> wrote:
depending on how big your "window" is, i think a better term for
this is
*cross-correlation* not autocorrelation. it's a single stream of
audio so
in a sense of the word, it *is* autocorrelation, but what i
normally think
of, with that semantic is something where the lag is no bigger or
not much
bigger than the analysis window of either loop-end region of the
audio and
the loop-begin.
if the loop points are separated by a much longer time (number of
samples)
than the size (in samples) of the two slices of audio being
correlated, it's
really cross-correlation. and you might find poor correlation
given all
lags that you're looking at. in fact, doing cross-correlation from
one part
of the tone or sound to another part that has a rapid change in
amplitude
envelope might fool your correlation into thinking there is a good
match
when there really isn't (because the amplitude is increasing, then
the
cross-correlation increases, but not necessarily because of a good
match).
so, instead of either cross or autocorrelation, you might want to
consider
AMDF between the loop end and potential candidates to loop back
to. instead
of looking for a maximum, you're looking for a minimum and a very low
minimum means a good match (or a bad match during a very low signal
level).
Looking at the equation here for AMDF:
http://mi.eng.cam.ac.uk/~ajr/SpeechAnalysis/node72.html
It seems like the algorithm I came up with independently is very
similar. The absolute value of the difference of the sample points is
taken as with AMDF. Prior to summing the values together though, I'm
multiplying by the window I described before (with a peak in the
center where the loop point is), giving samples closer to the loop
point more weight.
it's AMDF (with a more window). that link above shows the AMDF with a
rectangular window. usually "no window" means "rectangular window".
and usually we think that rectangular ain't the best kind of window.
you can also square the difference (which is the same thing as
squaring the abs value). i might call this the ASDF. a continuous-
time representation (used for pitch detection) is depicted as Eq. (1)
in the Wavetable-101.pdf paper you can find at the music-dsp site. in
fact, you can raise the abs value of the difference to any power
that's a positive number. this is really the Lp norm where AMDF with
be the L1 norm and ASDF would be the L2 norm. the higher the power,
the more it emphasizes the bigger errors (de-emphasizing the little
errors).
one more thing about the L2 norm and ASDF is that it can be related
directly to a form of autocorrelation. the ASDF is really an upside-
down autocorrelation (with an offset). so the ASDF will have nulls or
valleys precisely where the autocorrelation has peaks.
In practice this seems to work quite well and I'm going to leave it as
is for now. It seems reasonably fast and straight forward.
find good loop points, then crossfade.
another thing about cross fading is that there is something you can
do to
adapt a little to better or poor loop points. if the loop points
(and the
window surrounding them) match well, then you're doing a crossfade
between
coherent audio and a constant voltage crossfade is indicated (when
the
crossfade is half done, both the fade out and fade in envelopes are
at 50%).
if the loop points are not well matched (but it's the best loop
points your
correlation function can find), then you want to do a crossfade
that is
closer to a constant power crossfade where both fade in and fade out
envelopes are at 70.7% at the midpoint of the crossfade. there is
a way to
define the optimal crossfade function for any correlation between 0
(when
it's like crossfading white noise to white noise) to 100% (like
crossfading
a perfectly periodic waveform to a similarly appearing portion of the
waveform at loop start).
does any of this make any sense?
I'm not sure I'm following you. From what I can understand it sounds
like you are saying that the degree to which the two loop point signal
windows match could be used to select different cross fade envelope
curves, for a better perceptual cross fade.
yes. i am saying exactly that.
I hadn't given this much
thought and just assumed a linear cross fade (0-100%) would be the way
to do it (that is from a limited DSP background mind you). I am
intrigued by this idea though. Any tips on how to generate the
envelope functions and what sort of equation could be used for
selecting the optimal envelope based on the signal correlation?
Olli has responded with the two particular end-points. if the two
loop points are well correlated, you want to use the linear
crossfading you planned (where the fade-in and fade-out functions
always add to 1, that's what i call a "constant voltage" crossfade).
but if the two loop points are completely uncorrelated (like you would
get for white noise), then you want crossfade envelopes that are the
sqrt() of your constant voltage crossfade (you want the square of your
envelopes to add to 1, i call that a "constant power" crossfade).
you should always be able to find loop points that have correlation of
at least 0 (completely uncorrelated). even if it's real crap that you
are splicing to other real crap, assuming both signals have the DC
removed, the correlation function will have no DC component and both
negative values and positive values (with different lags). you will
always want to choose a lag with the largest correlation (a normalized
correlation as close to 1 as possible). so your correlation will be
better than what you would get if it was completely uncorrelated white
noise.
a few years ago i was investigating this and was considering writing a
paper for the AES about it. there is a theory that you can design (on
the fly) optimal crossfade envelopes for any normalized correlation
between 0 and 1. if you want, i can dig up the notes and equations
about it.
can i ask what the application is? (i may have missed it, but i'll
look at
earlier posts.) if it's looping for sound/instrument samples, this
is an
analysis thing that is not real-time and we can consider finding
the best
loop-begin points for a large variety of possible loop-end points.
then
pick the pair that looks best, given whatever your measure of good
is. but
in a (time-domain) real-time pitch shifter, having so many choices
may not
be available to you. you might find yourself in a situation where
your
loop-end is pretty well defined, you have to find a place to splice
to and
take the best that you can get from that.
Its a sample/instrument editor, so its all non-realtime.
then what you should do is have your editor test a variety of loop-end
times and for each of those, find the best loop-begin point (according
to your AMDF or ASDF measure). so you would have a list of loop-end
candidates, each with their optimal loop-begin point (where the begin
point and end point match the best) and you would choose the loop-end
candidate that has the best match to its associated loop-begin.
if this is for a simple sampler, and if your sampler can more simply
"jump" from the loop-end to loop-begin point without the crossfade in
real-time, then you can *pre*-crossfade the audio data so that the
jump is as seamless as it would be if it were crossfaded. that's an
old sampling keyboard trick (from the 1980s). it works very well for
good matches (well correlated). not sure it would be very good for
poor matches, but splices around poor matches sound like crap anyway.
--
r b-j r...@audioimagination.com
"Imagination is more important than knowledge."
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp