> Which is all fine and good.  But how do you propose to understand or even
> usefully examine the actuality without any knowledge of the theory?  The
> subjectivity of your analysis is going to require that you examine the
> theory, unless you just want to be another one of the guys with colorful
> frequency transform plots that look nice but tell you just about nothing
> about how things sound.

My goal is not to determine how things sound.  See below for more explanation.

> > If I use oggenc
> > on a speech sample today, does it loose any critical (from an
> > acoustic phonetic
> > perspective) information?
> 
> How are you going to judge what is critical and what isn't without using
> psychoacoustics?

By using auditory phonetics.  The field of auditory phonetics attempts to
analyze speech sounds as they travel through the air in waveforms.  Given
a waveform, one can calulcate a spectrographic image of the waveform.

Currently, it is well known what features of speech cause in a spectrographic
plot of that speech's waveform.  What is unknown is what MP3/OggVorbis do to
this spectrogram (ie, the speech data).

Perhaps (and I suspect) the loss is totally negligable for some bitrate n.
Then the result of encoding a speech sample at bitrate n preserves all of
the features a linguist needs to do auditory phonetic analysis.  The
"interesting" auditory features of a speech waveform happen independant of
the psychoacoustics.  I'm attempting to examine if MP3/OggVorbis
psychoacoustics fit speech analysis well enough.

For example, if LAME at bitrate n causes much attenuation of white
noise in the lower frequency bands, it's completely unsuitable for
storing speech samples destined for linguistic analysis - low-band
noise is a major distinguisher of certain phonemes.

> I suspect you may want to approach a different question.  And no, I'm not
> really trying to deter you from doing this sort of study - I just want you
> to be aware of what you're getting into if you want to do the study in a
> useful way.  In fact, I'd love to know why MP3 (using the ATT psych model)
> does a particularly poor job of handling fricative sounds in speech, since
> I'm trying to patch that right now.  But I don't know how you're going to
> come to any reasonable conclusions without a pretty heavy dose of
> psychoacoustic and audio signal processing theory.

I'm not examining "why".  In part I wish I were.  Unfortunately I'm an
undergraduate Mathematics student, not a graduate linguistics student, and I
only have so much time.  The immediate question that my professor would like
answered is: "If I want to publish a library of speech online, is there some
reasonable format I can use that will keep the necessary speech features, and
preserve the academic viability of said speech?  Cause it would be nice if I
didn't have to post gigantic .wav files...."  Perhaps if I ever go on to
grad school for linguistics I'd take the tact you propose.  Just not now ::-).

> And I say that as someone who has done speech & speaker recognition, MPEG
> audio, and audio quality analysis work for the last eight or so years in a
> variety of capacities.

Any qualitative analysis?  My professor insists no one has done a rigorous
study of speech preservation through MPEG/OggVorbis compression.  Can I tell
him he's wrong?


Ross Vandegrift
[EMAIL PROTECTED]
_______________________________________________
mp3encoder mailing list
[EMAIL PROTECTED]
http://minnie.tuhs.org/mailman/listinfo/mp3encoder

Reply via email to