Re: [gnuspeech-contact] Status of GnuSpeech

David Hill Wed, 27 Oct 2010 17:53:28 -0700

Hi Ken,

On Oct 24, 2010, at 5:52 PM, Kenneth Reid Beesley wrote:

On 22Oct2010, at 15:46, David Hill wrote:
Dear Ken,

Apologies for the delay in replying to you email request.
Hello David,
Many thanks for your message. No apologies are necessary. As faras I know, you don't owe anything to anyone---and certainly not to me.About once a year I look around for a practical articulatory text-to-speech toolkit, complete, supported, documented and ready touse. I know that's asking a lot.From what I've been able to read, gunspeech looks very promising,so I launch an inquiry from time to time. I hope they come acrossas friendly inquiries.

An enquiry about gnuspeech cannot be anything other than friendly.Your interest is appreciated!

The project at present is not being actively developed. The lastrepository update was made approximately 11 months ago by DalmazioBrisinda and included major upgrades to Monet and a newlycompleted component required (or at least very, very useful) fornew language development -- "Synthesizer" which allows researchersto determine the acoustic consequences of arbitrary vocal tractconfigurations in the tube model (though the current "Synthesizer"is really a beta release and needs some code clean-up and a fewadditional features). There are a number of additional important"Monet" modules to be ported to manage posture, rule andtransition editing. Unfortunately for you, the sub-modules thatdeal with developing these necessary new language data componentswithin Monet are only stubs at present. Some help in continuingthe port would be most welcome,though I realise that you representan end user, and not a developer, so this is not really a call onyou. [Anyone out there interested?]
Thanks for the update. What kind of expertise do you need tocontinue development?

I think Dalmazio gave you a good short overview of that. If you are aMac guy and have done any work with Cocoa (Interface Builder, andxcode) and know C you should be in pretty good shape, apart fromgetting up to speed on the model basis for the parameter generation(it's all there in the NeXT code in the repository of course, butreading code is no a lot of fun, especially you don't already knowwhat it is doing).

However, there are some papers available for download from myuniversity web site (Section F. of "Published papers" at http://pages.cpsc.ucalgary.ca/~hill). I've attached is an HTML copy of therelevant section (F) from the "Published papers" page at the site andyou should find the links work to grab a copy of any paper you'd liketo see, for those papers that are downloadable.

To understand the basis for the synthesis, it is probably worthreading my 1978 paper first. It formed the basis for implementing themajor component in Monet and the TextToSpeech Server, providing adesynchronized framework for manipulating the needed parameter tracksbased on databases representing each language. It refers to hardwaresynthesizers, but these days synthesizers are run as software, thanksto phenomenal processor speed. We are using a synthesizer known as"the tube model" which is simply a full emulation of the branchedacoustic tube that forms the vocal apparatus. The AVIOS paper (1995)explains how the tube model works. There are several papers relatedto the intonation and rhythm research for English on which the rhythmand intonation modelling of the synthesizer are based. The basicresearch is completely described in the 1977 ASA Meeting paper, butthat is not yet on line. My 1979 and 1992 papers on the topic giveyou an idea of what we found, but I suspect the intonation and rhythmfor a native American Indian language would be quite a bit different,though you may already have collected some data that could be usedfor the modelling. We actually carried out some experiments to testthe "goodness" of our intonation and rhythm modelling, as derivedfrom the data we collected from earlier analyses.

The detailed empirical process of using Monet, Synthesizer, and agood quality spectrograph, to create the databases was surprisinglyeasy. I really need to write a paper or manual to explain it indetail, but if you understand the target transition model of speech(a good bit different to the traditional segmental approach,especially given the desynchronisation involved in the synthesismodel), the process is pretty obvious and straightforward, though youneed to have access to a corpus of data on the sounds of the languageyou wish to create. English has been so well studied for years thatthe real targets for vowels and some consonants, the virtual targetsfor stops sounds, the noise characteristics of sibilants, and so on,were readily available. This is a separate issue from the rhythm andintonation. We created our own model for British English rhythm basedon the analysis of a significant body of both formal andconversational British English speech. The intonation model isessentially that of M.A.K. Halliday and David Abercrombie -- welldescribed in Halliday's book:"A Course in Spoken EnglishIntonation" (OUP 1970, SBN 19 453066 3) which came with audio tapes,but is no longer in print. I should get onto OUP and see if they'lllet me put the audio on my web site. We used some of the speech fromthose tapes as the basis for our rhythm study. But we did someexperiments to test the model and added some refinements of our own.

These are the kinds of resources and approaches you'll need to createa synthesis system for a native American Indian language.


Hope this helps.

Warm regards.

david

----------


F. gnuspeech-related publications

This section collects together those publications of particularrelevance to the GNU Project "gnuspeech"—a system designed make iteasy to create the databases for real-time articulatory speechsynthesis for arbitrary languages, and provide the real-timesynthesis tools to use the resulting synthesis in applications. Thisis for convenience. They duplicate entries in the listings above.Those who would like to help with the port of the original completelyfunctional system that ran on the NeXT computer are invited tocontact Professor Hill. Much of the work is done and the real-timesynthesis is available for testing. Some of the data-base creationparts are not yet completely ported.

HILL DR (2006) Manual for the Synthesizer application -- part of theGnuSpeech text-to-speech toolkit On-line manual relevant to the real-time articulatory-synthesis-based text-to-speech system described inthe AVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"HILL DR (1993,2004) MONET speech synthesis editing system manual(TextToSpeech Kit tool). On-line manual relevant to the real-timearticulatory-synthesis-based text-to-speech system described in theAVIOS 95 paper: "Real-time articulatory speech-synthesis-by-rules"HILL DR, MANZARA L & C-R SCHOCK (1993, 2003) Pronunciation guide forTextToSpeech kit Pronunciation guide for Webster's, Trillium and IPAphonetic transcriptions.HILL DR (2001) A conceptionary for speech and hearing in the contextof machines and experimentation This document is a considerablyenlarged and revised version of Hill (1976b) below. It is designed asan educational and reference tool (Check the term "conceptionary" inthe Wikipedia).HILL DR, MANZARA L & C-R SCHOCK (1995) Manual for the original NeXTDeveloper TextToSpeech kitHILL, D.R., MANZARA, L. & TAUBE-SCHOCK, C-R. (1995) Real-timearticulatory speech-synthesis-by-rules. Proc. AVIOS '95 14th AnnualInternational Voice Technologies Conf, San Jose, 12-14 September1995, 27-44 (C)HILL, D.R., SCHOCK, C-R & MANZARA, L. (1992) Unrestricted text-to-speech revisited: rhythm and intonation. Proc. 2nd. Int. Conf. onSpoken Language Processing, Banff, Alberta, Canada, October12th.-16th., 1219-1222 (C)JASSEM, W., HILL, D.R. & WITTEN, I.H. (1984) Isochrony in Englishspeech: its statistical validity and linguistic relevance. Pattern,Process and Function in Discourse Phonology (collection ed. DavyddGibbon), Berlin: de Gruyter, 203-225 (J)HILL, D.R., JASSEM, W. & WITTEN, I.H. (1979) A statistical approachto the problem of isochrony in spoken British English. Current Issuesin Linguistic Theory 9 (eds. H. & P. Hollien), 285-294, Amsterdam:John Benjamins B.V. (J)[This paper first appeared as University ofCalgary Computer Science Department "Yellow Series" report # 78/27/6]HILL, D.R. (1978) A program structure for event-based speechsynthesis by rules within a flexible segmental framework. Int. J. Man-Machine Studies 10 (3), 285-294, May (J)HILL, D.R. & REID, N.A. (1977a) An experiment on the perception ofintonational features. Int. J. Man-Machine Studies 9 (2), 337-347 (J)HILL, D.R. (1977) Some results from a preliminary study of BritishEnglish speech rhythm. 94th. Meeting of the Acoustical Society ofAmerica, Miami, Dec 12-16 (Full text available as U of CalgaryComputer Science Dept. Report 78/26/5, contact the author; soon to beavailable on-line) (R)HILL, D.R. (1975a) Avoiding segmentation in speech analysis: problemsand benefits. Proc. 8th. Int. Cong. of Phonetic Sciences, Leeds, UK,Aug 17-23, paper 128 (C)HILL, D.R. (1975b) Computer models for synthesising British Englishrhythm and intonation. Proc. 8th. Int. Cong. of Phonetic Sciences,Leeds, UK, Aug 17-23, paper 129 (C)

The existing code-base and software components are quite stablefor both Mac OS X and GNUstep. Have you tried any of them out?
Many thanks for your interest. I hope to have better news for you"real soon now".
I'm mostly a Mac guy, so I really should see how far I can get withthe existing system. I'll try to do that as soon as I can.


[snip]

_______________________________________________
gnuspeech-contact mailing list
[email protected]
http://lists.gnu.org/mailman/listinfo/gnuspeech-contact

Re: [gnuspeech-contact] Status of GnuSpeech

Reply via email to