[gnuspeech-contact] Re: Adjustment to the Carr é DRM model

David Hill Wed, 12 Nov 2008 18:35:15 -0800

Hi Dalmazio,

On Nov 10, 2008, at 9:13 PM, Dalmazio Brisinda wrote:

Very cool! Much of this sounds quite similar to what we weretalking about over a month ago re: computation of the resonantfunction based on a smoothly changing radial interpolation functiondepending on where in the tube we were -- but especially atboundaries. In this case, they use MRI data for this function.

As I mentioned before, the snag is that more sections would beneeded, the sample rate would increase, and computation speed wouldlikely be an issue again.

Just had a slightly playful thought, I wonder if there is MRI datafor samples limited to aesthetically pleasing male and femalevoices (separate). I'm sure there would be some physiologicaldifferences between taking the average of MRI data over a large'random' sample vs. limiting to just 'attractive' samples.

Voice quality has more to do with the glottal excitation function(including intonation) than vocal tract shape, though some vocaltract effects are pleasing -- like *clarity* of articulation, onwhich we still don't have a good handle (some speakers seem to adjusttheir articulation to maximise the clarity by adjusting the formantsfor best effect, but not in a voluntary way. I got that from WalterLawrence himself).

So, I'm curious, what were the subjective results like? I wouldsuspect much smoother sounding synthesis, and therefore greaterintelligibility.

Good topic for a PhD thesis :-) Intelligibility is not synonymouswith better quality synthesis. DECTalk (MITalk) is prettyintelligible but very tiring to listen to for long periods, which isprobably due in large part to the unnatural rhythm and intonation.

That shifting of the zero-crossings/DRM boundaries towards the lipsis also interesting. 60/40 weighting for the length of the backhalf vs. the length of the front half. Is anyone looking intoincorporating these two changes into gnuspeech? The 60/40 weightingchange would probably not be too difficult. The change involvingusing MRI data to create a non-uniform radial function sounds alittle more involved though, but very interesting!

The real point is that the "rest" state of the tube is non-uniform,but it produces similar formants to a uniform tube. This means theboundaries of the tube DRM regions are shifted from the originaltheory and the radii have to be different in the rest state. Thisalmost certainly means, again, more sections are needed. It wouldnot be that easy but needs to be looked at.


Warm regards.

david
--------
David Hill
[EMAIL PROTECTED]
http://savannah.gnu.org/projects/gnuspeech
--------

The only function of economic forecasting is to make astrology lookrespectable. (J.K. Galbraith)

--------

On 10-Nov-08, at 9:11 PM, David Hill wrote:

Hi Dalmazio,
I thought you might be interested in the paper of which theattachment is a summary. Basically, the author is saying that thereal neutral vocal tract is not a uniform tube, even though itproduces formants very similar to a uniform tube but the non-uniformity moves the DRM boundaries towards the lips.
Warm regards.

david

-------
The Journal of the Acoustical Society of America -- November 2001-- Volume 110, Issue 5, pp. 2761-2762
A distinctive region model based on empirical vocal tract areafunctions (A)
   Brad H. Story
Univ. of Arizona, Speech and Hearing Sci., P.O. Box 210071,Tucson, AZ 85721-0071
The development of the Distinctive Region Model (DRM) [Mrayati etal., Speech Commun. (1988)] is based on theoretically derivedacoustic characteristics for a tube of uniform cross-sectional areaassumed to approximate a neutral vocal tract configuration. Formantsensitivity functions calculated for the uniform tube are used todivide the vocal tract into distinctive regions that, whenconstricted or expanded, will cause the formant frequencies tochange in a predictable pattern. This study compares the originalDRM (based on a uniform tube) with a new version created from aneutral vocal tract area function derived from published MRI data.Because it is subject to physiologic constraints, this neutral areafunction is nonuniform in cross-sectional area variation butexhibits formant frequencies similar to a uniform tube. Sensitivityfunction calculations for F1, F2, and F3 also show similarities tothose of a uniform tube, but the zero-crossing points that dividethe vocal tract into distinctive regions are shifted toward thelips. The result is distinctive regions that are not symmetricabout the vocal tract mid-point but rather the back and frontregions occupy about 60% and 40% of the total tract length,respectively. [Work supported by NIH R01-DC04789.]
--------

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

[gnuspeech-contact] Re: Adjustment to the Carr é DRM model

Reply via email to