On Friday 14 December 2007 09:01, Tim Goetze wrote: > Indeed, the chorus/aliasing effect is as good as gone with > --no-peaklock, but the vocals begin to suffer from periodic amplitude > modulation (tremolo).
I can believe that. Crispness 2 or 3 (combined with --no-peaklock) might get rid of that -- you're essentially trading off tricks and artifacts that are "useful" in certain cases against others. Let me expand a bit on that and on Rubber Band in general. First off, there's nothing "new" about any of the techniques used, and Rubber Band doesn't claim to be the best sounding timestretcher out there. Much as I would like it to be, if I'd set out with that intention I would never have got anything released. What Rubber Band does aim to do is fill the gap in the free software ecology for a timestretching library that sounds good enough for general musical use, and that also meets the other requirements that make it useful in practical applications such as the capability for sample-exact stretching, real-time safety, known latency, the ability to change ratios dynamically, support for any number of channels at any sample rate, and not blowing up too easily when faced with extreme ratios. Though perhaps not all of those at once (e.g. it isn't quite sample-exact in real-time mode). In terms of DSP, Rubber Band is a phase vocoder (standard STFT analysis/synthesis with phase-unwrapping) with a few additional techniques, all of which have their tradeoffs: * Phase locking to peak frequencies (after Laroche & Dolson 1999). This reduces the vagueness and phasiness that the phase vocoder introduces at ratios relatively close to 1, but makes everything sound metallic at long stretches. Rubber Band reduces the region of influence around each peak as the frequency decreases (otherwise the bass quickly goes out of tune) and cuts down gradually on the amount of locking it does as the stretch increases (though it seems not enough). This technique is the one you're switching off with --no-peaklock. Long stretches tend to make their own demands on phase to produce satisfying (as opposed to strictly accurate) results -- c.f. Paul Nasca's paulstretch, designed for very very long stretch factors, which randomizes phases altogether. * Phase resynchronisation -- resetting the synthesis phases from the raw analysis phases -- at noisy transients (a simplistic take on Duxbury et al 2002). This is usually an effective technique when stretching things that have crisp transients, particularly drum loops and the like. It doesn't work so well in some other cases. The most serious cases are sounds with mostly stable frequencies (e.g. a smooth vocal) together with something that may not be loud but is perceived by the stretcher as having transient attacks (e.g. acoustic guitar): the stretcher will exaggerate the guitar onsets and leave a corresponding tremolo in the vocal. This technique also sounds very bad if the transient locations are mis-identified (particularly if they're picked up one or two FFT frames too late, as can happen with some sorts of transient). To switch this off, use --no-transients. Rubber Band also supports a band limited mode (--bl-transients) which resets phases only outside the most likely range for low order harmonics; this can sound better in some situations, but it will also lose you some ongoing phase coherence again. * Variable stretch factor -- reducing the amount of stretch around transients and increasing it in relatively "still" sections. This can improve the transient sound over a plain phase vocoder, even with --no-transients, but it can also lead to mis-timing if Rubber Band fails to identify a transient or if the timing within a "still" section is unusually important. You can disable this with --precise. With --precise --no-peaklock --no-transients, you should have pretty much a classic phase vocoder. I can think of a small number of potential improvements to make, but tuning this stuff is quite hard -- push on one side and something pops out on the other. Almost every change you make that improves one test case seems to result in a deterioration in another, and quality is more subjective than you might expect. There are plenty of papers out there that claim improvements in performance but actually produce lousy results for most real music. > At what stretching/compressing ratios have you run your tests, and > with what kind of source material? Some tests with "individual track" sources (individual instruments, drum loops etc) and some with "complex mixture" sources (folk songs, pop songs etc). As you surmise, this is mostly with ratios in the "time correction" range, up to about 25% either way. I did a few runs with longer ratios like 3x and 5x, but I've been less picky about the results. Time stretching only -- I actually haven't run any listening tests at all on pitch shifting. > While the code is certainly doing quite fine, I have a feeling that my 2x > test runs could be a bit out of line with the intended kind of use. That's not outside the intended range, but it may be a bit of a weak spot. The peaklock reduction doesn't strongly kick in until rather long ratios, so you will probably get less metallic results (though fuzzier) by default at 3x than at 2x. Chris _______________________________________________ Linux-audio-dev mailing list [email protected] http://lists.linuxaudio.org/mailman/listinfo/linux-audio-dev
