Re: [music-dsp] Dither video and articles

Emanuel Landeholm Fri, 28 Mar 2014 12:55:25 -0700

>
> First, it's meaningless to talk about bit depth alone


I agree with the points you raise and I'd like to add that you can also
trade bandwidth for bits.


On Fri, Mar 28, 2014 at 8:31 PM, Sampo Syreeni <de...@iki.fi> wrote:

> On 2014-03-28, robert bristow-johnson wrote:
>
>  14 bits???  i seriously disagree.  i dunno about you, but i still listen
>> to red-book CDs (which are 2-channel, uncompressed 16-bit fixed-point).
>>  they would sound like excrement if not well dithered when mastered to the
>> 16-bit medium.
>>
>
> I'd argue the same. First, it's meaningless to talk about bit depth alone.
> What we can hear is dictated first by absolute amplitude. If the user turns
> the knob to eleven, the number of bits doesn't matter: at some point you'll
> hear the noise floor, and any distortion products produced by quantization.
> That will even happen without user intervention when your work is used in a
> sampler, and because of things like broadcast compressors.. Second, at that
> point you'll also hear noise modulation, which sounds pretty nasty in
> things like reverb tails which always go to zero in the end. And third,
> people can hear stuff well below the noise floor. Even if the floor is set
> so low that you can hear it but don't really mind it, distortion products
> can still be clearly audible, and coming from hard quantization, rather
> annoying.
>
> A fourth reason which might not be too important in audio DSP but sure can
> be in measurement and detection processes is how linear your circuits
> actually are. As soon as you apply things like matched filters, statistical
> tests or classification engines to data, those things don't have an
> absolute threshold of hearing at all, and especially with binary decisions,
> can latch onto arbitrarily faint spurs caused by quantization. An audio
> relevant example of that might be given by digital watermarking which is
> *supposed* to be inaudible, or let's say audio forensics, where you
> purposely try to unmask otherwise inaudible content in audio, or even
> things like audio coders, which shouldn't but still can be inordinately
> sensitive to inaudible statistical features of sound. (E.g. MP3 is
> ridiculously sensitive to high quality, uncorrelated stereo reverb. That's
> effectively just colored noise to the ear, and the precise time structure
> doesn't matter at all, but once you compress it, it eats so much bandwidth
> 192kbps often ceases to be transparent.)
>
> All that means that just to be sure, it makes sense to be principled and
> always apply proper dither no matter how many bits you have, or latest when
> your bits leave the signal chain you personally control, and whose gain
> structure you can engineer. Certainly with any 16 bit format, because we
> already know it takes something like 21 to 22 bits to cover the whole
> dynamic range of the human ear.
>
> In that vein, I should probably say a couple of words about subtractive
> dither, which is my particular interest. The audio standard is additive
> TPDF at two quantization levels peak to peak. That's because the process is
> onesided, so that it's easy to apply, and that amount and shape are in a
> certain sense optimum. The theory goes so that a rectangular dither at one
> level P2P decouples the first moment of the error signal from the utility
> one, adding a second similar dither signal decouples the second moment, and
> so on to infinity. Two independent 1RPDF signals summed means the result is
> white independent and its PDF is the convolution of the two rectangular
> ones, yielding the standard 2TPDF. That's sufficient for audio use because
> the second moment is just variance, so that decouping it kills noise
> modulation. You can't hear the difference beyond that, but the analysis is
> nifty in that it shows you which precise statistical assumptions you can
> make about the noise floor, and e.g. that a Gaussian dither signal -- which
> is the limit of an infinite number of 1RPDF signals added by the central
> limit theorem -- is never an ideal dither because its amplitude would have
> to be infinite as well if you want to decouple the first, most important
> moments fully.
>
> The fun thing about subtractive dither is that in that case 1RPDF is
> already perfect wrt all momenta, and if you add anything more, it won't
> hurt because it will be subtracted out just the same, except with
> ridiculous amounts when headroom becomes an issue. Based on that I've even
> been coding a little something which I'm hoping some anal audiophile might
> even find reassuring enough to use. The idea is to do subtractive dither
> but with 2TPDF. The point is, if you can't decode it, it still works as a
> compatibility additive format. If you can decode it, it's ideal and
> perfect, with all of the subtractive benefits such as no accumulation in a
> long signal chain. So much is old news, but then the tricky part is to
> actually make it efficient enough and usable in the wild.
>
> The way I go about it right now is to use an efficient xor-shift RNG which
> is periodically rekeyed from a kind of randomness extractor operating in a
> closed loop over the data stream. That means that if you have a signal but
> aren't sure if it's using the system (blindly subtracting the dither would
> lead to additive noise), the system quasi-periodically self-synchronizes,
> and after that you can see whether the dither signal is there by doing an
> approximate, straight correlation with the generated dither stream. All of
> that means the system becomes pretty robust in the face of cut and paste,
> formats which can't flag its use via metadata (say, CDs), and even varying
> bitwidths in the channel, such as seen in libsndfile (the extractor is just
> a parity over the sample so it's immune to trailing zeroes, and the
> registration is sample accurate, so that if you lose sync in the middle of
> the stream, it will be reacquired uniquely at the next sync point, which
> occur at a set set probability per sample, based on what the extractor
> lifts from the stream; also there are a couple of minor twists so that the
> thing doesn't rekey on silent channels etc.). Right now the main problems
> to be solved are efficiency in software, state size of the RNG in hardware,
> and avoiding any oscillation from the rekey mechanism through the RNG loop
> in realistic conditions, and finalizing the correlator code in a form which
> doesn't use multiplication; might be I have to ditch the bigger xor-shift
> generator for something leaner and perhaps nonlinear.
>
> Comments, anybody?
>
>
>  in fact, i think that in a very real manner, Stan Lipshitz and John
>> Vanderkooy and maybe their grad student, Robert Wannamaker, did no less
>> than *save* the red-book CD format in the late 80s, early 90s. and they did
>> it without touching the actual format.  same 44.1 kHz, same 2-channels,
>> same 16-bit fixed-point PCM words.  they did it with optimizing the
>> quantization to 16 bits and they did that with (1) dithering the
>> quantization and (2) noise-shaping the quantization.
>>
>
> OTOH that interpretation is in my opinion an overstatement. Sure,
> principled dithering in converters and what not solves the low amplitude
> harshness problem wholesale, but in practice most CD's produced even before
> additive TPDF dither became the norm were sufficiently, perceptually
> speaking, autodithered by the broadband signal or the external noise floor.
>
>
>  the idea is to get the very best 16-bit words you can outa audio that has
>> been recorded, synthesized, processed, and mixed to a much higher
>> precision. i'm still sorta agnostic about float v. fixed except that i had
>> shown that for the standard IEEE 32-bit floating format (which has 8
>> exponent bits), that you do better with 32-bit fixed as long as the
>> headroom you need is less than 40 dB.  if all you need is 12 dB headroom
>> (and why would anyone need more than that?) you will have 28 dB better S/N
>> ratio with 32-bit fixed-point.
>>
>
> Personally I like 32-bit floats. They have 24 bits of mantissa, which
> would already be sufficient for transparent fixed point. They're already
> perfect, with the exponent giving the dynamic headroom for resonant filters
> and whatnot. Now if only you could easily use them *as* fixed point numbers
> *too*... I mean things like truncated IIR's call for precise control of the
> exact rounding error, which you can't really achieve with floats, in a
> dependable manner and using clean code.
>
> --
> Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front
> +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2
> --
> dupswapdrop -- the music-dsp mailing list and website:
> subscription info, FAQ, source code archive, list archive, book reviews,
> dsp links
> http://music.columbia.edu/cmc/music-dsp
> http://music.columbia.edu/mailman/listinfo/music-dsp
>
--
dupswapdrop -- the music-dsp mailing list and website:
subscription info, FAQ, source code archive, list archive, book reviews, dsp 
links
http://music.columbia.edu/cmc/music-dsp
http://music.columbia.edu/mailman/listinfo/music-dsp

Re: [music-dsp] Dither video and articles

Reply via email to