Now here is what I understand of the theory behind PSOLA with 2N-sized window:

Say the period is N, and we break the signal into Hann-windowed grains of size 2N and overlap N. Obviously all these grains are identical apart from a shift of kN. Let these grains be h(t+kN), where h(t) is 0 outside (-N,N). Then the original signal is a convolution of a pulse train with h(t). PSOLA takes this pulse train as the glottal wave and h(t) as the vocal tract response. The rest are fairly straightforward. One notable point is that wherever you put the Hann windows, they are centred at the glottal pulses by definition. Different alignments of the pulses will produce different h(t)'s though.

In this scenario LP-PSOLA is but another way to get h(t). In this case h(t) is a Hann-windowed grain of size 2N convolved with the LP filter. If the LP-residue behaves like noise, then the grains associated with different alignments would look much like each other. Not very sure how much that helps, but LP-PSOLA does solve the gap issue with large down-shifting.

Xue


-----Original Message----- From: Ross Bencina
Sent: Wednesday, October 23, 2013 2:19 PM
To: A discussion list for music-related DSP
Subject: Re: [music-dsp] PSOLA pitch shifting - resample or not?

Hi Guys,

It seems to me that the missing link here is recognising the theory
behind this approach:

The idea is to isolate each vocal tract filtered glottal pulse in its
own grain (i.e. glottal pulse convolved with the impulse response of the
vocal tract). Thus changing the grain rate is more or less equivalent to
changing the glottal pulse rate leaving the vocal tract IR remains
unchanged (except you're also convolving with a window).

If the IR length is longer than the fundamental period you won't be able
to isolate the pulses exactly. But if the IR is shorter than the period
then you would expect lowering the frequency to add gaps. Similarly,
raising the frequency would increase overlap of each filtered glottal pulse.

What I'd like to know is what's the best way of centering the windows on
the pulses? and is it better to use asymmetrical windows?

Ross.


On 23/10/2013 2:05 AM, Robert Bielik wrote:
Wen Xue skrev 2013-10-22 16:53:
One issue I find with 2N is that if you downshift by more than one
octave you get gaps between the grains.

Exactly. This is the point :) Otherwise you won't get the impression
that you've downshifted the pitch that much.

In such case I'm thinking you may use something like 3N or 4N or 5N so
that the output grains also have ample coverage on the time axis. For
example if you choose the smallest kN larger than 2M, you'll safeguard
at least 50% overlap rate in the output.

Problem is that if you have more than 2N size of grain, you'll introduce
the original pitch in the resulting spectrum (with higher amplitude the
larger the grain gets), and I don't think that is what you want...

/Rob


Xue

-----Original Message----- From: robert bristow-johnson
Sent: Tuesday, October 22, 2013 12:13 AM
To: A discussion list for music-related DSP
Subject: Re: [music-dsp] PSOLA pitch shifting - resample or not?


hey, thanks for picking this up, Rob.  i am still a little bleary-eyed
from the AES convention that ended yesterday.

On 10/21/13 3:56 AM, Robert Bielik wrote:
Hi again Xue,

Robert Bielik skrev 2013-10-19 16:14:
No. The formant is preserved just by NOT resampling the original
signal. The pitch of the signal is only dependent on the periodicity
of each wave "granule", which is pretty much a windowed snapshot of
the original signal with length 2*N where N is the original
periodicity.

Further to the point, the windowed granule size should be 2*min(N,M)
where N is original periodicity and M is target periodicity.


this is interesting, but i am not so sure i agree with it.  i've always
been going under the assumption that the grain size is 2N, twice the
length of the input period (and overlapping complementary windows so
that at a shift of 0 cents, there is perfect reconstruction of the
original).  but i always thought that if upshift, there would be more
than 2 overlapping grains.  for a maximum of 1 octave up, i've used a
maximum of 4 overlapping grains.

but i am *very* interested to find out if/that my previous M.O. is wrong.

--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews,
dsp links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to