> Which is all fine and good. But how do you propose to understand or even > usefully examine the actuality without any knowledge of the theory? The > subjectivity of your analysis is going to require that you examine the > theory, unless you just want to be another one of the guys with colorful > frequency transform plots that look nice but tell you just about nothing > about how things sound.
My goal is not to determine how things sound. See below for more explanation. > > If I use oggenc > > on a speech sample today, does it loose any critical (from an > > acoustic phonetic > > perspective) information? > > How are you going to judge what is critical and what isn't without using > psychoacoustics? By using auditory phonetics. The field of auditory phonetics attempts to analyze speech sounds as they travel through the air in waveforms. Given a waveform, one can calulcate a spectrographic image of the waveform. Currently, it is well known what features of speech cause in a spectrographic plot of that speech's waveform. What is unknown is what MP3/OggVorbis do to this spectrogram (ie, the speech data). Perhaps (and I suspect) the loss is totally negligable for some bitrate n. Then the result of encoding a speech sample at bitrate n preserves all of the features a linguist needs to do auditory phonetic analysis. The "interesting" auditory features of a speech waveform happen independant of the psychoacoustics. I'm attempting to examine if MP3/OggVorbis psychoacoustics fit speech analysis well enough. For example, if LAME at bitrate n causes much attenuation of white noise in the lower frequency bands, it's completely unsuitable for storing speech samples destined for linguistic analysis - low-band noise is a major distinguisher of certain phonemes. > I suspect you may want to approach a different question. And no, I'm not > really trying to deter you from doing this sort of study - I just want you > to be aware of what you're getting into if you want to do the study in a > useful way. In fact, I'd love to know why MP3 (using the ATT psych model) > does a particularly poor job of handling fricative sounds in speech, since > I'm trying to patch that right now. But I don't know how you're going to > come to any reasonable conclusions without a pretty heavy dose of > psychoacoustic and audio signal processing theory. I'm not examining "why". In part I wish I were. Unfortunately I'm an undergraduate Mathematics student, not a graduate linguistics student, and I only have so much time. The immediate question that my professor would like answered is: "If I want to publish a library of speech online, is there some reasonable format I can use that will keep the necessary speech features, and preserve the academic viability of said speech? Cause it would be nice if I didn't have to post gigantic .wav files...." Perhaps if I ever go on to grad school for linguistics I'd take the tact you propose. Just not now ::-). > And I say that as someone who has done speech & speaker recognition, MPEG > audio, and audio quality analysis work for the last eight or so years in a > variety of capacities. Any qualitative analysis? My professor insists no one has done a rigorous study of speech preservation through MPEG/OggVorbis compression. Can I tell him he's wrong? Ross Vandegrift [EMAIL PROTECTED] _______________________________________________ mp3encoder mailing list [EMAIL PROTECTED] http://minnie.tuhs.org/mailman/listinfo/mp3encoder
