RE: [MP3 ENCODER] Lossy Audio Compression Research

M. Alexander Broadhead Fri, 16 Nov 2001 12:54:08 -0800

Howdy Ross,

> By using auditory phonetics.  The field of auditory phonetics
> attempts to
> analyze speech sounds as they travel through the air in
> waveforms.  Given
> a waveform, one can calulcate a spectrographic image of the waveform.


Is that different than a spectrogram?  Just curious.

> Currently, it is well known what features of speech cause in
> a spectrographic
> plot of that speech's waveform.  What is unknown is what
> MP3/OggVorbis do to
> this spectrogram (ie, the speech data).
>
> Perhaps (and I suspect) the loss is totally negligable for
> some bitrate n.
> Then the result of encoding a speech sample at bitrate n
> preserves all of
> the features a linguist needs to do auditory phonetic analysis.  The
> "interesting" auditory features of a speech waveform happen
> independant of
> the psychoacoustics.  I'm attempting to examine if MP3/OggVorbis
> psychoacoustics fit speech analysis well enough.
>
> For example, if LAME at bitrate n causes much attenuation of white
> noise in the lower frequency bands, it's completely unsuitable for
> storing speech samples destined for linguistic analysis - low-band
> noise is a major distinguisher of certain phonemes.

Ah.  You're right then.  This is a different problem, and probably one that
doesn't require too much understanding of psychoacoustics.  You just want to
know whether a particular set of features which aren't necessarily relevant
to speech quality are preserved by coding.  Strange question, but I see why
you want to know.

What is this kind of analysis used for, BTW?  Automatic speech recognition?
(I mostly did _speaker_ or voice recognition, so phonemic identification
wasn't that important to me.)

> I'm not examining "why".  In part I wish I were.  Unfortunately I'm an
> undergraduate Mathematics student, not a graduate linguistics
> student, and I
> only have so much time.

I'm not sure this is really the sort of thing covered in linguistics?  (Or
mathematics, for that matter.)  I came to what I do via music
(electroacoustics) and electrical engineering (DSP).  I think there are some
programs in 'machine translation' (in the context of linguistics) and/or CS
that hit parts of it as well.  Actually, I generally find the whole thing to
be so multi-disciplinary (and such a small niche compared to video) that I
doubt _anyone_ comes to it directly.

> The immediate question that my
> professor would like
> answered is: "If I want to publish a library of speech
> online, is there some
> reasonable format I can use that will keep the necessary
> speech features, and
> preserve the academic viability of said speech?  Cause it
> would be nice if I
> didn't have to post gigantic .wav files...."  Perhaps if I
> ever go on to
> grad school for linguistics I'd take the tact you propose.
> Just not now ::-).

Well, if nothing else, I believe there are at least a few lossless audio
compression algorithms that will save significant disk space.  I'm not real
familiar with them, but others on this list are.

> > And I say that as someone who has done speech & speaker
> recognition, MPEG
> > audio, and audio quality analysis work for the last eight
> or so years in a
> > variety of capacities.
>
> Any qualitative analysis?

Sure - I worked for a year at a company that makes equipment that it sells
to wireless phone makers and cellular network companies.  Their boxes are
supposed to, among other things, analyze the quality of the received speech.
It was there that I really started to get a good understanding of just how
hard that is to do.

> My professor insists no one has
> done a rigorous
> study of speech preservation through MPEG/OggVorbis
> compression.  Can I tell
> him he's wrong?

I think your professor is probably right.  Your question sounds very
specific to the visual? or automatic? analysis methods he's/you're using.  I
suppose it might have come up before, but I'd bet your professor would be
more aware of any studies than people who just work on codecs.

Sorry to give you a hard time - I've just heard too many people talk about
speech technology as if it were a new, unstudied field, and watched a lot of
effort go into reinventing various wheels.

Thanks for the explanation,
Alex


_______________________________________________
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder

RE: [MP3 ENCODER] Lossy Audio Compression Research

Reply via email to