> > First, it's meaningless to talk about bit depth alone
I agree with the points you raise and I'd like to add that you can also trade bandwidth for bits. On Fri, Mar 28, 2014 at 8:31 PM, Sampo Syreeni <de...@iki.fi> wrote: > On 2014-03-28, robert bristow-johnson wrote: > > 14 bits??? i seriously disagree. i dunno about you, but i still listen >> to red-book CDs (which are 2-channel, uncompressed 16-bit fixed-point). >> they would sound like excrement if not well dithered when mastered to the >> 16-bit medium. >> > > I'd argue the same. First, it's meaningless to talk about bit depth alone. > What we can hear is dictated first by absolute amplitude. If the user turns > the knob to eleven, the number of bits doesn't matter: at some point you'll > hear the noise floor, and any distortion products produced by quantization. > That will even happen without user intervention when your work is used in a > sampler, and because of things like broadcast compressors.. Second, at that > point you'll also hear noise modulation, which sounds pretty nasty in > things like reverb tails which always go to zero in the end. And third, > people can hear stuff well below the noise floor. Even if the floor is set > so low that you can hear it but don't really mind it, distortion products > can still be clearly audible, and coming from hard quantization, rather > annoying. > > A fourth reason which might not be too important in audio DSP but sure can > be in measurement and detection processes is how linear your circuits > actually are. As soon as you apply things like matched filters, statistical > tests or classification engines to data, those things don't have an > absolute threshold of hearing at all, and especially with binary decisions, > can latch onto arbitrarily faint spurs caused by quantization. An audio > relevant example of that might be given by digital watermarking which is > *supposed* to be inaudible, or let's say audio forensics, where you > purposely try to unmask otherwise inaudible content in audio, or even > things like audio coders, which shouldn't but still can be inordinately > sensitive to inaudible statistical features of sound. (E.g. MP3 is > ridiculously sensitive to high quality, uncorrelated stereo reverb. That's > effectively just colored noise to the ear, and the precise time structure > doesn't matter at all, but once you compress it, it eats so much bandwidth > 192kbps often ceases to be transparent.) > > All that means that just to be sure, it makes sense to be principled and > always apply proper dither no matter how many bits you have, or latest when > your bits leave the signal chain you personally control, and whose gain > structure you can engineer. Certainly with any 16 bit format, because we > already know it takes something like 21 to 22 bits to cover the whole > dynamic range of the human ear. > > In that vein, I should probably say a couple of words about subtractive > dither, which is my particular interest. The audio standard is additive > TPDF at two quantization levels peak to peak. That's because the process is > onesided, so that it's easy to apply, and that amount and shape are in a > certain sense optimum. The theory goes so that a rectangular dither at one > level P2P decouples the first moment of the error signal from the utility > one, adding a second similar dither signal decouples the second moment, and > so on to infinity. Two independent 1RPDF signals summed means the result is > white independent and its PDF is the convolution of the two rectangular > ones, yielding the standard 2TPDF. That's sufficient for audio use because > the second moment is just variance, so that decouping it kills noise > modulation. You can't hear the difference beyond that, but the analysis is > nifty in that it shows you which precise statistical assumptions you can > make about the noise floor, and e.g. that a Gaussian dither signal -- which > is the limit of an infinite number of 1RPDF signals added by the central > limit theorem -- is never an ideal dither because its amplitude would have > to be infinite as well if you want to decouple the first, most important > moments fully. > > The fun thing about subtractive dither is that in that case 1RPDF is > already perfect wrt all momenta, and if you add anything more, it won't > hurt because it will be subtracted out just the same, except with > ridiculous amounts when headroom becomes an issue. Based on that I've even > been coding a little something which I'm hoping some anal audiophile might > even find reassuring enough to use. The idea is to do subtractive dither > but with 2TPDF. The point is, if you can't decode it, it still works as a > compatibility additive format. If you can decode it, it's ideal and > perfect, with all of the subtractive benefits such as no accumulation in a > long signal chain. So much is old news, but then the tricky part is to > actually make it efficient enough and usable in the wild. > > The way I go about it right now is to use an efficient xor-shift RNG which > is periodically rekeyed from a kind of randomness extractor operating in a > closed loop over the data stream. That means that if you have a signal but > aren't sure if it's using the system (blindly subtracting the dither would > lead to additive noise), the system quasi-periodically self-synchronizes, > and after that you can see whether the dither signal is there by doing an > approximate, straight correlation with the generated dither stream. All of > that means the system becomes pretty robust in the face of cut and paste, > formats which can't flag its use via metadata (say, CDs), and even varying > bitwidths in the channel, such as seen in libsndfile (the extractor is just > a parity over the sample so it's immune to trailing zeroes, and the > registration is sample accurate, so that if you lose sync in the middle of > the stream, it will be reacquired uniquely at the next sync point, which > occur at a set set probability per sample, based on what the extractor > lifts from the stream; also there are a couple of minor twists so that the > thing doesn't rekey on silent channels etc.). Right now the main problems > to be solved are efficiency in software, state size of the RNG in hardware, > and avoiding any oscillation from the rekey mechanism through the RNG loop > in realistic conditions, and finalizing the correlator code in a form which > doesn't use multiplication; might be I have to ditch the bigger xor-shift > generator for something leaner and perhaps nonlinear. > > Comments, anybody? > > > in fact, i think that in a very real manner, Stan Lipshitz and John >> Vanderkooy and maybe their grad student, Robert Wannamaker, did no less >> than *save* the red-book CD format in the late 80s, early 90s. and they did >> it without touching the actual format. same 44.1 kHz, same 2-channels, >> same 16-bit fixed-point PCM words. they did it with optimizing the >> quantization to 16 bits and they did that with (1) dithering the >> quantization and (2) noise-shaping the quantization. >> > > OTOH that interpretation is in my opinion an overstatement. Sure, > principled dithering in converters and what not solves the low amplitude > harshness problem wholesale, but in practice most CD's produced even before > additive TPDF dither became the norm were sufficiently, perceptually > speaking, autodithered by the broadband signal or the external noise floor. > > > the idea is to get the very best 16-bit words you can outa audio that has >> been recorded, synthesized, processed, and mixed to a much higher >> precision. i'm still sorta agnostic about float v. fixed except that i had >> shown that for the standard IEEE 32-bit floating format (which has 8 >> exponent bits), that you do better with 32-bit fixed as long as the >> headroom you need is less than 40 dB. if all you need is 12 dB headroom >> (and why would anyone need more than that?) you will have 28 dB better S/N >> ratio with 32-bit fixed-point. >> > > Personally I like 32-bit floats. They have 24 bits of mantissa, which > would already be sufficient for transparent fixed point. They're already > perfect, with the exponent giving the dynamic headroom for resonant filters > and whatnot. Now if only you could easily use them *as* fixed point numbers > *too*... I mean things like truncated IIR's call for precise control of the > exact rounding error, which you can't really achieve with floats, in a > dependable manner and using clean code. > > -- > Sampo Syreeni, aka decoy - de...@iki.fi, http://decoy.iki.fi/front > +358-40-3255353, 025E D175 ABE5 027C 9494 EEB0 E090 8BA9 0509 85C2 > -- > dupswapdrop -- the music-dsp mailing list and website: > subscription info, FAQ, source code archive, list archive, book reviews, > dsp links > http://music.columbia.edu/cmc/music-dsp > http://music.columbia.edu/mailman/listinfo/music-dsp > -- dupswapdrop -- the music-dsp mailing list and website: subscription info, FAQ, source code archive, list archive, book reviews, dsp links http://music.columbia.edu/cmc/music-dsp http://music.columbia.edu/mailman/listinfo/music-dsp