The last month or so I've been working with adding synthetic speech and
voice recognition to my 747 project.
What type of project is that ? - (FlightGear related ?)
The results have been quite good; unfortunately it's kind of hard to demonstrate or display the results.
lol, right - except of course if you want to shoot a movie :-)
But for those amongst us who have no local festival engine, it might be illustrative to hear some simple ATC phrases (generated by festival) for download, somewhere at http://www.cstr.ed.ac.uk/projects/festival/ you can find a link to a web-based interface to a festival engine, whose output will then be sent to your browser.
Jim Brennan is preparing a corpus of messages and ATC phrases which will be used to create a LM (Language Model) for speech recognition and the synthetic speech voices come from a variety of sources -- most notably, the FestVox folks at CMU, MBROLA, and the OGI-Festival project at CSLU.
Not that sure, though about the speech-recognition part, I simply think there are too many variables and limiting factors, to really make it feasible within the near future - maybe I'm simply being to pessimistic ;-)
But of course speech synthesis just itself would already be advantageous to have.
Both the ASR program and TTS program can run as applications (foreground or background) on a single machine interfacing with FG via the loopback IP address 127.0.0.1 or on additional machines connected via a LAN.
...or on tts.flightgear.org as an on demand "stream"-server, offering centralized speech synthesis even for users with slower machines ;-)
Just wondering if there is any interest in adding this capability to FG.
I think, definitely yes - TTS/speech synthesis-ideas have been brought up various times here, looking for "TTS" or directly "festival" within the mail archive returns threads like:
(BTW: even without a locally installed search engine -several were suggested - for flightgear's mailing list archives on flightgear.org,
it would be nice if the addresses to mail-archive.com could be added to
http://www.flightgear.org/mail.html where you keep reading:
"There is currently no search capability [...]")
But getting back to the old discussions: so there seems to be a great interest in it - pretty much most of FlightGear's counterparts are meanwhile equipped with basic TTS functionality, so why not FG, too ?
About all that is required is a socket-type (IPC) interface to send the text string to the TTS application in the specified wrapper, and the TTS program (Festival) running in server mode to create an audio signal.
I have recently looked into the IO handling stuff, as well as the ability to use xml-definable protocols , looks to me as if these two things could come in handy when it really comes to establishing a simple IPC mechanism for FlightGear <-> festival interaction.
Maybe using a new node within the property tree that does not only hold the string to be processed, but also the respective rules that apply, cause one would need to define some kind of aviation specific "dialect", in order to have festival speak special parts of a transmission using a separate rule, like for example a callsign, which shouldn't be spelled character by character, but rather converted to it's "ALPHA-ZULU"-equivalents. (Just as a simple example meant though ...)
Of course it would later on also be interesting, to have festival bindings available within XML-configurable sound files, so that not only audio files can be played, but also speech dynamically synthesized on demand, thinking of the logical implementation of more advanced airliner mechanisms like the GPWS, it would certainly come in very handy to suddenly be able to make FlightGear 'talk' to the user, like "Terrain Terrain Terrain" ;-)
In addition to interactive inputs, the TTS program will receive comm traffic from other AI controllers that produce communications with other model entities active in the simulation.
The most frequently requested words/phrases could even be buffered either locally within FlightGear's base package directory (using a pre-defined buffer size), or indeed remotely as I suggested already above, that way one could actually create a rather comprehensive repository of common ATC chatter, and use a similar mechanism to that of terrasync, by rsync'ing such snippets on demand - in order to take care of different bandwidth, the ATC submodel might have to "pre-request" some files, though - so that they are available when needed.
Installing, compiling, and configuring the TTS and ASR packages requires a little work, but it's not brain surgery.
While the compiling & packaging part is a no-brainer for the windows folks, one might ponder about offering statically linked versions of festival for most common other platforms and put 1-3 of these packages (including the proper configs and plenty of sounds) directly on the FlightGear-CD ROM: another good reason to purchase the FG CD-ROMs ;-)
And of course, having a subfolder with hundreds of ATC snippets by default, becomes as much an advantage as having all the scenery available, at least when it comes to really having to download all those files ...
Both packages are open-source and available (see http://linux-sound.org/speech.html for some sources). The real body of work is in the code and logic to create the AI controller(s) that can respond to a real, live, unstructured input.
I think particularly the latter is extremely challenging, this sounds like work for a whole new project to me :-/
Even *IF* the recognition part is handled well enough, there are still many factors ...
Like, people would definitely have to follow standard procedures - as in real life, - which is not going to appeal to novices, who don't know anything about the used phraselogy.
So all this would need to be dictionary-based, dealing with loops that expect a certain transmission depending on the previous transmissions - which is certainly not clever to hard-code, but rather a script-able approach would be preferable, possibly using a mechanism like state machine implemented via xml files for each transmission, so that these are being recursively parsed depending on the current context ... now that I'm thinking about it, it does sound manageable, but admittedly also pretty involved ... :-/
But as soon as it all works - there would be an entirely new dilemma if the speech synthesis output needs to be artificially distorted just to make it "as real as it gets" ;-)
_______________________________________________ Flightgear-devel mailing list [EMAIL PROTECTED] http://mail.flightgear.org/mailman/listinfo/flightgear-devel 2f585eeea02e2c79d7b1d8c4963bae2d