----- Original Message -----
From: "David Luff" <[EMAIL PROTECTED]>
To: "FlightGear developers discussions" <[EMAIL PROTECTED]>
Sent: Tuesday, September 21, 2004 6:49 AM
Subject: Re: [Flightgear-devel] A voice for FG

> On 9/17/04 at 2:04 PM John Wojnaroski wrote:
> >----- Original Message -----
> >From: "Jon Stockill" <[EMAIL PROTECTED]>
> >To: "FlightGear developers discussions" <[EMAIL PROTECTED]>
> >Sent: Friday, September 17, 2004 11:54 AM
> >Subject: Re: [Flightgear-devel] A voice for FG
> >
> >
> >> John Wojnaroski wrote:
> >> > Hi,
> >> >
> >> > The last month or so I've been working with adding synthetic speech
> and
> >> > voice recognition to my 747 project. The results have been quite
> >> > unfortunately it's kind of hard to demonstrate or display the
> >> >
> >> > Jim Brennan is preparing a corpus of messages and ATC phrases which
> >will
> >be
> >> > used to create a LM (Language Model) for speech recognition and the
> >> > synthetic speech voices come from a variety of sources -- most
> notably,
> >the
> >> > FestVox folks at CMU, MBROLA, and the OGI-Festival project at CSLU.
> >>
> >> I was working with the pre-release of festival 2.0 at work last week,
> >> and the new synthesis methods and voices that are available in that
> >> release sound particularly impressive. I did think of the possibility
> >> using it for air traffic control, if not "live" then as an easy method
> >> to generate a batch of samples for use in a similar way to the way ATIS
> >> works at the moment.
> >>
> >The approach that I've taken is to start a festival server on a networked
> >machine and a small client program that receives a text message as a
> string
> >, stuffs it into a festival protcol wrapper and calls the
> >festivalStringToWave() method. This also will allow you to send control
> >commands and files to the server to change voices, LMs, etc..
> >
> >"../bin/festival --server loopback"  starts the server and any client on
> >the
> >local machine can connect by default. Connections over a LAN require a
> >small
> >Scheme script to add users to the festival_access_list as part of the
> >argument list.
> >
> >The client program then has a few lines of socket code to connect to FG.
> On
> >the FG side all you need is something to send a text string over the
> >socket.
> >Something like FGVoice::fg_say_mesg("this is a test"); There are a couple
> >of
> >good examples in the /examples/ directory which I used to create a
> >"atc_net_demo.cpp" application.
> >
> >The voice recognition is just as easy (actually easier to set up) but
> >training the model, building the Acoustic model, and the dictionary plus
> >any
> >special phones is a little more envolved. If you don't mind a bit of a
> >delay
> >(around 2-3 sec) to decode the audio, you can use the existing models and
> >get pretty good results. The resultant text string is sent to the AI
> >controller where it is parsed into tokens and analyzed(compiled?).
> >
> >I'm not sure how all of this would fit into FG. I suspect the easiest way
> >would be to create a voice object and a few methods and leave it up to
> >individual user if they want to setup the TTS festival package or ASR
> >programs.
> >
> This all sounds very exciting, especially the encouraging results from the
> voice recognition stage, and the fact that Jon thinks that Festival 2 is
> sounding pretty good.  Could you send me the code you've got so far for
> sending strings across to FG?
Actually, there are three parts --- the ASR that converts speech to text
(that runs on my laptop) and sends the text string over the LAN to the AI
app that analyzes and generates a response and sends the text over to the
festival engine (TTS) for conversion back to audio. I'll send along more
details via private mail and attach the tar files.

> I'm a bit unclear which parts you are actually working on.  Are you
> on the decoding of the speech to text-strings only so far, or have you
> actually started on logically decoding the text strings for ATC-AI?  This
> is the part I'm currently in deep thought about.
Working on the speech to text and text to speech ends, the middle ATC/AI is
the really tough part. Have you ever looked at www.vatsim.org or
www.ivao.org ? A different approach using real "actors" to create a virtual
world. But, alas, it's all MS based...

> A few random thoughts.  Speech recognition for ATC ought to be easier than
> the general case, since the smaller vocabulary ought to mean that better
> guesses can be made, if this sort of thing can be specified to the ASR
> engine.

Yes, yes, and yes
> Having it on another PC could be nice from the point of view that the
> engine sound from the FG PC can come from the speakers - with ATC from the
> TTS/ASR PC put though headphones to fairly realistically simulate the real
> environment.
Exactly how I'm setting up my sim... and it does impart a better sense and
feel; especially if you mount the engine speakers to the rear with
left/right stereo

John W.

Flightgear-devel mailing list

Reply via email to