> I'm a mech.eng., so I've done some applied math, but not a lot in the way of
> filters yet, unfortunately. I do know a bit about lossless compression
> techniques, though, as well as a bit of digital audio stuff and fourier and
> laplace transformations. For instance, I know pretty well how JPEG works.
"Willing to learn" as am I, generally. (My wife is beginning to grumble about
my textbook budget. I have to relearn all the math I scoffed at as an
undergraduate...)
>> *Specifically* I'd be looking for someone intimately familiar with direct
>> form II IIR filters, the Levinson-Durbin and Schur algorithms for producing
>> coefficient sets and the underlying linear and discrete math.
>>
>Sorry, I don't know these filter types. Would you have any literature on
>them? What do they do and what does IIR stand for?
IIR filters are 'Infinite Impulse Response' filters. Direct form II is a
subclass. LPC (Linear Predictive Coding) is a particular use of a special case
of Direct Form II.
"Understanding Digital Signal Processing" (ISBN 0-201-63467-8) by Richard Lyons
has a fairly succinct explanation of IIR filters, the z transform and their
discrete math bases. The explantion is a summary, however, real mathemeticians
would probably want more :-) I'm just an engineer...
>LPC = Linear Predicting Codec? (LP very probably means linear predict*, but
>what's the C?)
Coding. Actually 'LP' means something entirely different (Linear Programming,
which I've also been using in Vorbis ;-)
>I'm not sure I understand the reason for the situation you describe, would
>you care to elaborate?
I'm not sure what you're missing, but some playing with IIR filters is probably
in order...
The Levinson-Durbin and Schur algorithms are autocorrelation/reflection based
methods of generating a particular filter coefficient set that is optimal
(given particular error criteria). The Vorbis source code makes use of
Levinson-Durbin (which is the same as used in GSM for example). I can dig up
references if you like (the papers are classics; most speech compressions today
rely on them).
>> LPC filters of a given order generally are
>> able to reproduce roughly equal-width features (this is a generalization)
>> on
>> the order of the size of the spectrum evenly split into bands by the order
>> of
>> the filter.
>>
>This is a bit difficult to understand. Assuming you have an LPC filter of an
>order O, what does the filter function look like (algebraically)?
An LPC filter is a special case of a form II filter... it looks a bit like
this:
y(n)=x(n) + A_1*x(n-1)*z^-1 + A_2*x(n-2)*z^-2 + ... + A_m*x(n-m)*z^-m;
>> the first half of the unit cirlce to the spectrum). A classic LPC filter
>> approximates all of the narrow low frequency features with a single very
>> roughly fit formant-like curve.
>>
>What's LSP, please? Apart from that, I have a very rough idea of what you're
>trying to say..
"Line Spectral Pair", (also occasionally called Line Spectral Form, LSF) an
alternate representation of LPC coefficients. LSP is completely equivalent
(and the LPC->LSP trasform is orthogonal). It's a relatively new thing (in all
the latest low bitrate voice compressions) so it can be a bit hard to track
down in the literature if you're not already familiar with it.
http://www2.xtdl.com/~rothwlr/lsfs/lsfcomp.htm has some pointers to find the
math behind it (and code, on which Vorbis's LSP is partially based).
>I believe I understand: Since our perception of audio is log2-based
Not just our perception; nature is overwhelmingly logarithmic :-) It is
generally true that audio sources will tend to have roughly equal energy by
octave. This generalization is less true than most, but it's truer than not.
>and
>therefore higher octaves are wider than low octaves to some power of two,
>high-frequency signals appear wider than corresponding signals in lower
>octaves, so they will be given a higher priority by 'LPC filter generators'.
>Therefore, lower-octave signals are not modelled well at all.
Basically.
>Ah, OK, I understand your approach and it's what I'd have recommended you
>do, too. I do not think there is a way to avoid losses in time-frequency
>domain transformations in the first place, nor when converting between
>linear and logarithmic representations.
Ah, the key that's not understood here is that LPC filters operate in the time
domain and adapt nicely to integer math. Because I can't make my time-domain
log-f LPC filter for real, I'm currently approximating such a beast using a
time->freq domain shift in Vorbis. I'd love to invent a form that avoids the
shift yet gets the same results.
Note that domain shifts are not taboo so long as their either entirely (not
approximately) orthogonal in practive *or* they're entirely deterministic so
that we can track and record error. The trick, naturally, is not to end up
with much error :-)
>With computers, everything is
>discrete, just about every calculation introduces loss (unless you're
>dealing with integers only).
Integer only is a good way to go with lossless codecs.
>Now an idea I have is to use a different set of FFT coefficients, or rather,
>to space them in a logarithmic way.
This is what Vorbis does, sort of. The FFT relies on even spacing (else you're
right back to a Phi(n^2) algorithm), but I can re-cast the spectrum after the
shift
since I'm just approximating. This approach won't work for lossless.
I'd like to invent math to avoid the shift. :-) I don't know actually,
that what I'd like to do most is possible: invent an LPC filter mechanism with
log-scale features. I suspect it isn't, but that's at best a hunch.
However, I'm convinced that there's *some* way around the problem.
Monty
--
MP3 ENCODER mailing list ( http://geek.rcc.se/mp3encoder/ )