Yes, it's very right that you can't recover a and b from a+b alone - half the information is missing.

However, if we remember the frames overlap, things are a little different. Say you also know b+c, then to guess a, b and c from a+b and b+c, only 1/3 information is missing. If you know all the way through to y+z, then only 1/26 is missing, from that you can make a pretty good guess at a--z. Perfect recovery can be achieved with a minor trick, e.g. fade-in your sequence so that a=0.

Whether or not the application tries to recover a--z is another matter. Even if it doesn't, a--z plus a reverb probably sounds fairly like a--z. But if it wants to, it can.

X.

-----Original Message----- From: Charles Henry
Sent: Tuesday, April 24, 2012 6:19 PM
To: A discussion list for music-related DSP
Subject: Re: [music-dsp] Window presum synthesis

On Mon, Apr 23, 2012 at 2:57 AM, Domagoj Šarić <dsar...@gmail.com> wrote:
On 20 April 2012 17:15, Charles Henry <czhe...@gmail.com> wrote:
Don't let it bother you too much.  I can tell by looking at it--This
is a stupid algorithm.

I sort of regret those words--it just seems so basic in terms of math
that I don't see much about it that's remarkable.

It does seem strange and counterintuitive at first glance but it's
hard to just simply dismiss it thus once you've seen it examined in
several respectable books (this http://hdl.lib.byu.edu/1877/etd157 is
also an often referenced ~300 page paper dedicated solely to the
subject in question) and especially once you've _heard_ it (the free
Richard Dobson's open-sourced pvoc effects).

Okay--just don't call it "more precise" when I'm listening or you'll
get my opinion :)  PVOC is probably a good application for these
averaged FFT's.  The reconstructed signal only needs some resemblance
to the original signal.

This doesn't give you greater precision in the frequency domain, it
just makes the results more localized to the center of the interval in
the time domain.  It smooths out the response a bit, but this is
really a *loss* of precision.

Exactly, and this is precisely what the technique tries to accomplish:
to still give you greater subband rejection but without the (or with
reduced) frequency detection precision...

- fold them to N time domain samples (i.e. simply add the first and
second half of the input data)

When you do this, you can no longer reconstruct the original spectrum.
 You don't know which interval the values come from.  This is sort of
like averaging the spectrums of adjacent N-point FFTs.

Obviously (i.e. by listening to Dobson's results) you somehow can :)

When I say reconstruction--I mean exact reconstruction mathematically.
I don't see it here.

- take an N point FFT
- do some processing with the "more precise spectrum"

An equivalent algorithm: apply the windowing on 2N, FFT, then throw
away each odd-numbered sample of the result.  (I'll leave it to you to
see why this is true--use the un-normalized DFT definition).

I know, that's exactly how Frederic Harris explains the idea in
chapter 8 (Time Domain Signal Processing with the DFT) in the
"Handbook of Digital Signal Processing - Engineering Applications
[Elliot, 1987]"...


Then, ask the authors why they think it's valuable to throw away half
of their samples and make it so you can't reconstruct the original
signal :)

Because "half their samples" are not valuable, they are (made)
redundant by the windowing procedure (i.e. the wide main lobe of the
window used covers several bins which thus carry duplicated
information). IOW this procedure tries to do something very similar as
zero padding the FFT but without the use of larger FFTs...

Zero-padding preserves the number of dimensions.  It's a function from
R^m to R^n where m<n.  The other way around is where you map R^n to
R^m and then you can't reconstruct the original signal--even though
these samples have values that sum from their adjacent samples (the
duplicated information).  There's just no linear combination that can
map those values back to the original data (yeah... I suppose I'm
preaching to the choir now... I'll stop :)

- take an N point IFFT
- and now what? :) we've got N time domain samples that correspond to
the folded input samples...I can't imagine it would sound good if this
is simply window-overlap-added and sent to output as is...

I can't imagine why either.

And yet, and yet... :)

It's just whether the technique is appropriate for the task at hand.
I was also reading your other thread--but I didn't understand these
threads were related.

Chuck
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Reply via email to