Re: [MP3 ENCODER] Use mp3 to archive audio CDs

Monty Tue, 23 Nov 1999 03:37:07 -0800

> I'm a mech.eng., so I've done some applied math, but not a lot in the way of
> filters yet, unfortunately. I do know a bit about lossless compression
> techniques, though, as well as a bit of digital audio stuff and fourier and
> laplace transformations. For instance, I know pretty well how JPEG works.

"Willing to learn" as am I, generally. (My wife is beginning to grumble about
my textbook budget.  I have to relearn all the math I scoffed at as an
undergraduate...)
 
>> *Specifically* I'd be looking for someone intimately familiar with direct
>> form II IIR filters, the Levinson-Durbin and Schur algorithms for producing
>> coefficient sets and the underlying linear and discrete math.
>> 
>Sorry, I don't know these filter types. Would you have any literature on
>them? What do they do and what does IIR stand for?

IIR filters are 'Infinite Impulse Response' filters.  Direct form II is a
subclass.  LPC (Linear Predictive Coding) is a particular use of a special case
of Direct Form II.

"Understanding Digital Signal Processing" (ISBN 0-201-63467-8) by Richard Lyons
has a fairly succinct explanation of IIR filters, the z transform and their
discrete math bases. The explantion is a summary, however, real mathemeticians
would probably want more :-)  I'm just an engineer...

>LPC = Linear Predicting Codec? (LP very probably means linear predict*, but
>what's the C?)

Coding.  Actually 'LP' means something entirely different (Linear Programming,
which I've also been using in Vorbis ;-)

>I'm not sure I understand the reason for the situation you describe, would
>you care to elaborate?

I'm not sure what you're missing, but some playing with IIR filters is probably
in order...  

The Levinson-Durbin and Schur algorithms are autocorrelation/reflection based
methods of generating a particular filter coefficient set that is optimal
(given particular error criteria).  The Vorbis source code makes use of
Levinson-Durbin (which is the same as used in GSM for example). I can dig up
references if you like (the papers are classics; most speech compressions today
rely on them).

>> LPC filters of a given order generally are
>> able to reproduce roughly equal-width features (this is a generalization)
>> on
>> the order of the size of the spectrum evenly split into bands by the order
>> of
>> the filter.
>> 
>This is a bit difficult to understand. Assuming you have an LPC filter of an
>order O, what does the filter function look like (algebraically)?

An LPC filter is a special case of a form II filter... it looks a bit like 
this:

y(n)=x(n) + A_1*x(n-1)*z^-1 + A_2*x(n-2)*z^-2 + ... + A_m*x(n-m)*z^-m; 

>> the first half of the unit cirlce to the spectrum). A classic LPC filter
>> approximates all of the narrow low frequency features with a single very
>> roughly fit formant-like curve.
>> 
>What's LSP, please? Apart from that, I have a very rough idea of what you're
>trying to say..

"Line Spectral Pair", (also occasionally called Line Spectral Form, LSF) an
alternate representation of LPC coefficients.  LSP is completely equivalent
(and the LPC->LSP trasform is orthogonal).  It's a relatively new thing (in all
the latest low bitrate voice compressions) so it can be a bit hard to track
down in the literature if you're not already familiar with it. 

http://www2.xtdl.com/~rothwlr/lsfs/lsfcomp.htm has some pointers to find the
math behind it (and code, on which Vorbis's LSP is partially based).

>I believe I understand: Since our perception of audio is log2-based

Not just our perception; nature is overwhelmingly logarithmic :-)  It is
generally true that audio sources will tend to have roughly equal energy by
octave.  This generalization is less true than most, but it's truer than not.

>and
>therefore higher octaves are wider than low octaves to some power of two,
>high-frequency signals appear wider than corresponding signals in lower
>octaves, so they will be given a higher priority by 'LPC filter generators'.
>Therefore, lower-octave signals are not modelled well at all.

Basically.

>Ah, OK, I understand your approach and it's what I'd have recommended you
>do, too. I do not think there is a way to avoid losses in time-frequency
>domain transformations in the first place, nor when converting between
>linear and logarithmic representations. 

Ah, the key that's not understood here is that LPC filters operate in the time
domain and adapt nicely to integer math.  Because I can't make my time-domain
log-f LPC filter for real, I'm currently approximating such a beast using a
time->freq domain shift in Vorbis.  I'd love to invent a form that avoids the
shift yet gets the same results.

Note that domain shifts are not taboo so long as their either entirely (not
approximately) orthogonal in practive *or* they're entirely deterministic so
that we can track and record error.  The trick, naturally, is not to end up
with much error :-)

>With computers, everything is
>discrete, just about every calculation introduces loss (unless you're
>dealing with integers only).

Integer only is a good way to go with lossless codecs.

>Now an idea I have is to use a different set of FFT coefficients, or rather,
>to space them in a logarithmic way.

This is what Vorbis does, sort of.  The FFT relies on even spacing (else you're
right back to a Phi(n^2) algorithm), but I can re-cast the spectrum after the 
shift
since I'm just approximating.  This approach won't work for lossless.

I'd like to invent math to avoid the shift. :-)  I don't know actually,
that what I'd like to do most is possible: invent an LPC filter mechanism with
log-scale features.  I suspect it isn't, but that's at best a hunch.

However, I'm convinced that there's *some* way around the problem.

Monty


--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )
Re: [MP3 ENCODER] Use mp3 to archive audio CDs

Reply via email to