Hi Dalmazio,
On Nov 10, 2008, at 9:13 PM, Dalmazio Brisinda wrote:
Very cool! Much of this sounds quite similar to what we were
talking about over a month ago re: computation of the resonant
function based on a smoothly changing radial interpolation function
depending on where in the tube we were -- but especially at
boundaries. In this case, they use MRI data for this function.
As I mentioned before, the snag is that more sections would be
needed, the sample rate would increase, and computation speed would
likely be an issue again.
Just had a slightly playful thought, I wonder if there is MRI data
for samples limited to aesthetically pleasing male and female
voices (separate). I'm sure there would be some physiological
differences between taking the average of MRI data over a large
'random' sample vs. limiting to just 'attractive' samples.
Voice quality has more to do with the glottal excitation function
(including intonation) than vocal tract shape, though some vocal
tract effects are pleasing -- like *clarity* of articulation, on
which we still don't have a good handle (some speakers seem to adjust
their articulation to maximise the clarity by adjusting the formants
for best effect, but not in a voluntary way. I got that from Walter
Lawrence himself).
So, I'm curious, what were the subjective results like? I would
suspect much smoother sounding synthesis, and therefore greater
intelligibility.
Good topic for a PhD thesis :-) Intelligibility is not synonymous
with better quality synthesis. DECTalk (MITalk) is pretty
intelligible but very tiring to listen to for long periods, which is
probably due in large part to the unnatural rhythm and intonation.
That shifting of the zero-crossings/DRM boundaries towards the lips
is also interesting. 60/40 weighting for the length of the back
half vs. the length of the front half. Is anyone looking into
incorporating these two changes into gnuspeech? The 60/40 weighting
change would probably not be too difficult. The change involving
using MRI data to create a non-uniform radial function sounds a
little more involved though, but very interesting!
The real point is that the "rest" state of the tube is non-uniform,
but it produces similar formants to a uniform tube. This means the
boundaries of the tube DRM regions are shifted from the original
theory and the radii have to be different in the rest state. This
almost certainly means, again, more sections are needed. It would
not be that easy but needs to be looked at.
Warm regards.
david
--------
David Hill
[EMAIL PROTECTED]
http://savannah.gnu.org/projects/gnuspeech
--------
The only function of economic forecasting is to make astrology look
respectable. (J.K. Galbraith)
--------
On 10-Nov-08, at 9:11 PM, David Hill wrote:
Hi Dalmazio,
I thought you might be interested in the paper of which the
attachment is a summary. Basically, the author is saying that the
real neutral vocal tract is not a uniform tube, even though it
produces formants very similar to a uniform tube but the non-
uniformity moves the DRM boundaries towards the lips.
Warm regards.
david
-------
The Journal of the Acoustical Society of America -- November 2001
-- Volume 110, Issue 5, pp. 2761-2762
A distinctive region model based on empirical vocal tract area
functions (A)
Brad H. Story
Univ. of Arizona, Speech and Hearing Sci., P.O. Box 210071,
Tucson, AZ 85721-0071
The development of the Distinctive Region Model (DRM) [Mrayati et
al., Speech Commun. (1988)] is based on theoretically derived
acoustic characteristics for a tube of uniform cross-sectional area
assumed to approximate a neutral vocal tract configuration. Formant
sensitivity functions calculated for the uniform tube are used to
divide the vocal tract into distinctive regions that, when
constricted or expanded, will cause the formant frequencies to
change in a predictable pattern. This study compares the original
DRM (based on a uniform tube) with a new version created from a
neutral vocal tract area function derived from published MRI data.
Because it is subject to physiologic constraints, this neutral area
function is nonuniform in cross-sectional area variation but
exhibits formant frequencies similar to a uniform tube. Sensitivity
function calculations for F1, F2, and F3 also show similarities to
those of a uniform tube, but the zero-crossing points that divide
the vocal tract into distinctive regions are shifted toward the
lips. The result is distinctive regions that are not symmetric
about the vocal tract mid-point but rather the back and front
regions occupy about 60% and 40% of the total tract length,
respectively. [Work supported by NIH R01-DC04789.]
--------
_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact