On Thu, Mar 12, 2020 at 9:35 PM robert bristow-johnson < r...@audioimagination.com> wrote:
> i am not always persuaded that the analysis window is preserved in the > frequency-domain modification operation. It definitely is *not* preserved under modification, generally. The Perfect Reconstruction condition assumes that there is no modification to the coefficients. It's just a basic guarantee that the filterbank is actually able to reconstruct the signal to begin with. The details of the windows/zero-padding determine exactly what happens to all of the block processing artifacts when you modify things. if it's a phase vocoder and you do the Miller Puckette thing and apply the > same phase to a entire spectral peak, then supposedly the window shape is > preserved on each sinusoidal component. Even that is only approximate IIRC, in that it assumes well-separated sinusoids or similar? The larger point being that preserving window shape under modification is an exceptional case that requires special handling. for analysis, there might be other properties of the window that is more > important than being complementary. > That's true enough: this isn't as crucial in analysis-only as it is for synthesis. Although, I do consider Parseval to be pretty bedrock in terms of DSP intuition, and would not want to introduce frame-rate modulations into analysis without a clear reason (of which there are many good examples, don't get me wrong). > if, after analysis, i am modifying each Gaussian pulse and inverse DFT > back to the time domain, i will have a Gaussian window effectively on the > output frame. by multiplying by a Hann window and dividing by the original > Gaussian window, the result has a Hann window shape and that should be > complementary in the overlap-add. > So, a relevant distinction here is whether an STFT filterbank uses matching analysis and synthesis windows. The PR condition is that their product obeys COLA. In the vanilla case, the analysis and synthesis windows are constrained to match (actually they're time-reversals of one another, but that only matters for asymmetric windows). Then, the PR condition is COLA on the square of the (common) window, and the appropriate window is of "square root" type, such as cosine. This is a "balanced" design, in that the analyzer and synthesizer play equal roles in the windowing. Note that this matching constraint removes many degrees of freedom from the window design. In general, for mismatched analysis and synthesis windows, the PR condition is very "loose." For example, you can use literally anything you want for the analysis window, provided the values are finite and non-zero (negative is okay!). Then you can pick any COLA window, and solve for the synthesis window as their ratio. In this way you can design PR filterbanks with arbitrarily bad performance :P So for the mismatched case, we need some additional design principle(s) to drive the window designs. Offhand, there seem to be two notable approaches to this. One is that rectangular "windows" are desired on the synthesis side in order to accommodate zero-padding/fast convolution type operation. Then, the analysis window is whatever COLA window you care to use for analysis purposes. As discussed, this is only appropriate for when the modification is constrained to be a length-K FIR kernel. The other is like your Gaussian example where you want to use a particular window for analysis/modification reasons, and then need to square that with the PR condition on the synthesis side. The downside here is that the resulting synthesis windows are not as well behaved in terms of suppressing block processing artifacts. They tend to become heavy-shouldered, exhibit regions of amplification, etc. This can be worth it, but only if you gain enough from the analysis/modification properties. > > Rectangular windows are a perfectly valid choice here, albeit one with > poor sidelobe suppression. > > but it doesn't matter with overlap-add fast convolution. somehow, the > sidelobe effects come out in the wash, because we can insure (to finite > precision) the correctness of the output with a time-domain analysis. > Right, the rectangular windows are not being used for spectral estimation in the fast convolution context, so their spectral properties are irrelevant. They just represent a time-domain accounting of what the circular convolution is doing. > so you're oversampling in the frequency domain because you're zero-padding > in the time domain. > Correct, zero-padding in the time domain is equivalent to upsampling in the frequency domain. > > Note that this equivalence happens because we are adding an additional > time-variant stage (zero-padding/raw OLA), to explicitly correct for the > time-variant effects of the underlying DFT operation. This is the block > processing analog of upsampling a scalar signal by K so that we can apply > an order-K polynomial nonlinearity without aliasing. > > where is this polynomial nonlinearity? and i am still not groking the > upsampling. > There's no nonlinearity in STFT, I'm just making an analogy. We have some process that we know will produce a finite amount of "out of band" output, and so we "upsample" by exactly that amount to avoid aliasing. Just as a polynomial nonlinearity requires frequency-domain headroom (upsampling), so does a circular convolution require time-domain headroom (zero-padding). It's the same basic engineering idea, just applied to different flavors of aliasing (non-linearity vs time-variance). > i think so, but i wonder if i am understanding your "upsampling" and > zero-padding. we're zero-padding in the time domain (making the length > longer to the DFT length N). the factor of length increase in the time > domain is the factor of oversampling in the frequency domain. a lot of > people make the sub-optimal decision to simply pad zeros equal to length of > the frame of audio. then doubling the length would be like inserting > oversampled frequency DFT bins between each of the "original" spectral > points. > Right, so as I mentioned before there are two sources of oversampling available. In a critically sampled design, the DFT size would equal the hop size. This is the minimum required to represent the signal, and anything on top of that represents redundancy in the filterbank representation (and so, additional overhead). Of course, that implies there is no overlap, the "windows" are rectangular, spectral analysis properties are poor, aliasing is not suppressed in synthesis, etc. So the first layer of oversampling is overlap/windowing. This is powerful because more data goes into each transform, and the windows explicitly attenuate edge effects and control the spectral properties. Still, this represents redundant overhead relative to the minimum require to represent the signal. The other downside of overlap is that it corresponds to latency, and reduces time resolution of the transform representation. (Generally, it is possible to critically sample in the overlapping case, and this is where MDCT comes in. In this case time-domain aliasing occurs even if you *don't* modify the coefficients, and you must design the system to satisfy TDAC in addition to PR. But this is more for coding and less for analysis/pvoc/filtering). The less powerful form of oversampling is zero-padding. Since there is no additional data going into the transforms, there is no latency cost or real increase in frequency resolution (it just interpolates the DFT of the non-zero-padded case). What this buys is processing margin for applying modifications in the frequency domain. And it can be helpful for analysis if you want increased frequency resolution for some estimation task. Ethan
_______________________________________________ dupswapdrop: music-dsp mailing list music-dsp@music.columbia.edu https://lists.columbia.edu/mailman/listinfo/music-dsp