Hi Ken,
On Oct 24, 2010, at 5:52 PM, Kenneth Reid Beesley wrote:
On 22Oct2010, at 15:46, David Hill wrote:
Dear Ken,
Apologies for the delay in replying to you email request.
Hello David,
Many thanks for your message. No apologies are necessary. As far
as I know, you don't owe anything to anyone---and certainly not to me.
About once a year I look around for a practical articulatory text-
to-speech toolkit, complete, supported, documented and ready to
use. I know that's asking a lot.
From what I've been able to read, gunspeech looks very promising,
so I launch an inquiry from time to time. I hope they come across
as friendly inquiries.
An enquiry about gnuspeech cannot be anything other than friendly.
Your interest is appreciated!
The project at present is not being actively developed. The last
repository update was made approximately 11 months ago by Dalmazio
Brisinda and included major upgrades to Monet and a newly
completed component required (or at least very, very useful) for
new language development -- "Synthesizer" which allows researchers
to determine the acoustic consequences of arbitrary vocal tract
configurations in the tube model (though the current "Synthesizer"
is really a beta release and needs some code clean-up and a few
additional features). There are a number of additional important
"Monet" modules to be ported to manage posture, rule and
transition editing. Unfortunately for you, the sub-modules that
deal with developing these necessary new language data components
within Monet are only stubs at present. Some help in continuing
the port would be most welcome,though I realise that you represent
an end user, and not a developer, so this is not really a call on
you. [Anyone out there interested?]
Thanks for the update. What kind of expertise do you need to
continue development?
I think Dalmazio gave you a good short overview of that. If you are a
Mac guy and have done any work with Cocoa (Interface Builder, and
xcode) and know C you should be in pretty good shape, apart from
getting up to speed on the model basis for the parameter generation
(it's all there in the NeXT code in the repository of course, but
reading code is no a lot of fun, especially you don't already know
what it is doing).
However, there are some papers available for download from my
university web site (Section F. of "Published papers" at http://
pages.cpsc.ucalgary.ca/~hill). I've attached is an HTML copy of the
relevant section (F) from the "Published papers" page at the site and
you should find the links work to grab a copy of any paper you'd like
to see, for those papers that are downloadable.
To understand the basis for the synthesis, it is probably worth
reading my 1978 paper first. It formed the basis for implementing the
major component in Monet and the TextToSpeech Server, providing a
desynchronized framework for manipulating the needed parameter tracks
based on databases representing each language. It refers to hardware
synthesizers, but these days synthesizers are run as software, thanks
to phenomenal processor speed. We are using a synthesizer known as
"the tube model" which is simply a full emulation of the branched
acoustic tube that forms the vocal apparatus. The AVIOS paper (1995)
explains how the tube model works. There are several papers related
to the intonation and rhythm research for English on which the rhythm
and intonation modelling of the synthesizer are based. The basic
research is completely described in the 1977 ASA Meeting paper, but
that is not yet on line. My 1979 and 1992 papers on the topic give
you an idea of what we found, but I suspect the intonation and rhythm
for a native American Indian language would be quite a bit different,
though you may already have collected some data that could be used
for the modelling. We actually carried out some experiments to test
the "goodness" of our intonation and rhythm modelling, as derived
from the data we collected from earlier analyses.
The detailed empirical process of using Monet, Synthesizer, and a
good quality spectrograph, to create the databases was surprisingly
easy. I really need to write a paper or manual to explain it in
detail, but if you understand the target transition model of speech
(a good bit different to the traditional segmental approach,
especially given the desynchronisation involved in the synthesis
model), the process is pretty obvious and straightforward, though you
need to have access to a corpus of data on the sounds of the language
you wish to create. English has been so well studied for years that
the real targets for vowels and some consonants, the virtual targets
for stops sounds, the noise characteristics of sibilants, and so on,
were readily available. This is a separate issue from the rhythm and
intonation. We created our own model for British English rhythm based
on the analysis of a significant body of both formal and
conversational British English speech. The intonation model is
essentially that of M.A.K. Halliday and David Abercrombie -- well
described in Halliday's book:"A Course in Spoken English
Intonation" (OUP 1970, SBN 19 453066 3) which came with audio tapes,
but is no longer in print. I should get onto OUP and see if they'll
let me put the audio on my web site. We used some of the speech from
those tapes as the basis for our rhythm study. But we did some
experiments to test the model and added some refinements of our own.
These are the kinds of resources and approaches you'll need to create
a synthesis system for a native American Indian language.
Hope this helps.
Warm regards.
david
----------
F. gnuspeech-related publications
This section collects together those publications of particular
relevance to the GNU Project "gnuspeech"—a system designed make it
easy to create the databases for real-time articulatory speech
synthesis for arbitrary languages, and provide the real-time
synthesis tools to use the resulting synthesis in applications. This
is for convenience. They duplicate entries in the listings above.
Those who would like to help with the port of the original completely
functional system that ran on the NeXT computer are invited to
contact Professor Hill. Much of the work is done and the real-time
synthesis is available for testing. Some of the data-base creation
parts are not yet completely ported.
HILL DR (2006) Manual for the Synthesizer application -- part of the
GnuSpeech text-to-speech toolkit On-line manual relevant to the real-
time articulatory-synthesis-based text-to-speech system described in
the AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
HILL DR (1993,2004) MONET speech synthesis editing system manual
(TextToSpeech Kit tool). On-line manual relevant to the real-time
articulatory-synthesis-based text-to-speech system described in the
AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"
HILL DR, MANZARA L & C-R SCHOCK (1993, 2003) Pronunciation guide for
TextToSpeech kit Pronunciation guide for Webster's, Trillium and IPA
phonetic transcriptions.
HILL DR (2001) A conceptionary for speech and hearing in the context
of machines and experimentation This document is a considerably
enlarged and revised version of Hill (1976b) below. It is designed as
an educational and reference tool (Check the term "conceptionary" in
the Wikipedia).
HILL DR, MANZARA L & C-R SCHOCK (1995) Manual for the original NeXT
Developer TextToSpeech kit
HILL, D.R., MANZARA, L. & TAUBE-SCHOCK, C-R. (1995) Real-time
articulatory speech-synthesis-by-rules. Proc. AVIOS '95 14th Annual
International Voice Technologies Conf, San Jose, 12-14 September
1995, 27-44 (C)
HILL, D.R., SCHOCK, C-R & MANZARA, L. (1992) Unrestricted text-to-
speech revisited: rhythm and intonation. Proc. 2nd. Int. Conf. on
Spoken Language Processing, Banff, Alberta, Canada, October
12th.-16th., 1219-1222 (C)
JASSEM, W., HILL, D.R. & WITTEN, I.H. (1984) Isochrony in English
speech: its statistical validity and linguistic relevance. Pattern,
Process and Function in Discourse Phonology (collection ed. Davydd
Gibbon), Berlin: de Gruyter, 203-225 (J)
HILL, D.R., JASSEM, W. & WITTEN, I.H. (1979) A statistical approach
to the problem of isochrony in spoken British English. Current Issues
in Linguistic Theory 9 (eds. H. & P. Hollien), 285-294, Amsterdam:
John Benjamins B.V. (J)[This paper first appeared as University of
Calgary Computer Science Department "Yellow Series" report # 78/27/6]
HILL, D.R. (1978) A program structure for event-based speech
synthesis by rules within a flexible segmental framework. Int. J. Man-
Machine Studies 10 (3), 285-294, May (J)
HILL, D.R. & REID, N.A. (1977a) An experiment on the perception of
intonational features. Int. J. Man-Machine Studies 9 (2), 337-347 (J)
HILL, D.R. (1977) Some results from a preliminary study of British
English speech rhythm. 94th. Meeting of the Acoustical Society of
America, Miami, Dec 12-16 (Full text available as U of Calgary
Computer Science Dept. Report 78/26/5, contact the author; soon to be
available on-line) (R)
HILL, D.R. (1975a) Avoiding segmentation in speech analysis: problems
and benefits. Proc. 8th. Int. Cong. of Phonetic Sciences, Leeds, UK,
Aug 17-23, paper 128 (C)
HILL, D.R. (1975b) Computer models for synthesising British English
rhythm and intonation. Proc. 8th. Int. Cong. of Phonetic Sciences,
Leeds, UK, Aug 17-23, paper 129 (C)
The existing code-base and software components are quite stable
for both Mac OS X and GNUstep. Have you tried any of them out?
Many thanks for your interest. I hope to have better news for you
"real soon now".
I'm mostly a Mac guy, so I really should see how far I can get with
the existing system. I'll try to do that as soon as I can.
[snip]
_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact