Re: keynote gold festival sapi5 voice

Hi all,
Haven't officially tried the Keynote Gold recreation yet but from what I've heard it is interesting. Not quite sure why it's so nostalgic to have this synth, but it definitely is sounding a lot like Keynote Gold from the samples I'm hearing, and I applaud the effort.
Now, there are people who are asking about the program being used to create the voices. Being an audio and language geek of sorts I decided to try to create my own voice without aid of tutorial. And I will describe below what I've found for those interested.
This program, called MNLP, seems to allow for not only tts voice creation but lip sync animation as well. I'm not sure what it's aiming to do but it seems to be for a sort of niche audience, which I am not criticizing. I like that sort of stuff.
If you want to create a tts voice, you must firstly know that you can't just take random recordings and plop them in and press a create button. This is be cause the program comes with a basic voice creator that will instruct you on sentences that you must speak. You then can optionally check the sentences to make sure the word boundaries are identified properly.
As Jake pointed out, the interface does not use standard windows GUI controls that screen readers will recognise. I ended up using Jaws since I am familiar with its Jaws cursor more than NVDA's object nav, and to both MNLP and Jaws credit, it is quite usable this way. Nevertheless, after about 4 hours of trying to use it and explore it, I found myself becoming frustrated and overwhelmed.
When recording a sentence, the program analyzes the recording to identify word boundaries which you can verify by just clicking on the words. This is important, as if you end up with word boundaries which are too far off, the resulting voice will start to jabble and not make a lot of sense with certain words and phrases. This greatly increases the time it takes to record the sen tences, since you can turn this word review off, and by doing so you eliminate the long analyses for each recording and the subsequent word review process. The trade-off is that you can't find mistakes and try to redo the recording in a more clear way to eliminate the mistake. It's a good thing I checked every single recording too, as some word boundaries were extremely far off, and I never would've caught them without review. One was so bad that the program couldn't even identify all the words in the sentence. I'm not sure why it had so much difficulty, my speech is not at all hard to understand and I speak a natural American English dialect which the program is supposedly most trained for. Needless to say I have lost faith in its automatic detection and I always check it, which is not the quickest of processes.
I spent 2 hours analyzing and then verifying each word boundary on 35 sentences. Then I wanted to see how the voice was sounding. So I compile d it. And that's where frustration set in. The voice took over half an hour to compile on my machine. And that's nowhere near what you'd need to produce a good voice... it's recommended you have a minimum of 1000 sentences recorded to really make a good voice! A voice with that many recordings, as Jake stated above in this topic, would take hours to compile! And that's not even the limit, as the included sentence list has over 3500 sentences you could record.
I also am not a very large fan of how the program pre-processes the audio before it is recorded. I went through the trouble of cleaning my mic up with an fx chain I set up in Reaper with compression, noise gates, and other denoisers I like. This produces a sound which I feel is not bad considering my cheap setup. However MNLP still seems to want to apply noise reduction, since there are warbly artifacts in its recorded speech which do not exist anywhere else. I even took off all the effects and lef t in my background noise, and I could tell the program was reducing it, and the warbly watery artifacts had become worse.
After compilation, I was elated to hear speech produced using a synthetic copy of my voice. However I immediately am noticing problems which could potentially add another long, laborious layer to this. The synthetic copy of my voice was having problems, even with the sentences I recorded. Syllable boundaries were slurred, and the ends of words were trailed or slurred off. Two sentences I had recorded but still had problems were:
"Not at this particular case, Tom, apologized Whittemore."
In the synthetic voice, the word apologized didn't have a proper D sound in it, and the W sounded strangled in the name whittemore. I had tried to articulate them well during recording, but had noticed that when I was checking word boundaries, the D in apologized had been overlooked for some reason and the W in whittmore had been largely cut as well f rom the word boundary. I overlooked it as I had been wanting to move on, but it seems to have become a problem. One more example:
"He was a head shorter than his companion, of almost delicate physique."
In this case, the synthetic voice almost completely mocks my strange prosody I had used while recording, which I neglected to take much notice to until after the fact, and I found this amusing. However, the K sound at the end of physique was for some reason cut. So when the synth says the word physique, it almost sounds like fizzy. During the recording process, I again noticed that the cut was made in the word boundary but did not want to draw too much attention to a little thing like that. In all of these cases if I could've gone in and manually edited the recorded data boundaries, I think I would already have a better sounding voice than I do now, even though I've only finished mear dozens of what could become over a thousand recordings. Some other con sonants were cut both from my recordings and from the final speech itself. Even in the Keynote Gold samples, you can here this, as the H from hello was cut by the program. The synthesizer does try to sort of interpolate data across phoneme boundaries where needed, but it can only do so much, and it can't really create things which aren't there without a load of artifacts.
Before you ask, I have deleted my early remnants of a TTS voice as I want to start over when I know a little more about the advanced features of the program. Thus, I can't show you what a voice in this early stage would sound like.
The developer recommends that for optimum results you should perform its speech recognition training for the program to determine your way of speaking and how you articulate phonemes. Unfortunately, satisfying the program that you have a good word list is tricky, partly because of its inaccessible gui. It also asks for fairly large text files of passages and word s which I suspect should be specially designed to capture all phoneme combinations for the English lexicon. I've not delved into that, but I am getting the impression that if you are going that deep into the process, you should already have a list of appropriate phrases to record which are not included in the program. You can also even create your own lexicons if you are so inclined! So yeah this program is definitely meant for a tweak head!
Despite my primitive understanding of linguistics and me being ill prepared to tackle a challenge like this, I decided to attempt the speech recognition training anyway. I recorded a few phrases which it recommended. However to feed the training with potentially useful information, you have to verify phoneme boundaries and detection, and this part of the process induces a massive headache. For a start, once you've recorded your voice, you then have to launch the built-in audio editor, which is completely inaccessible. The editor i s full of unlabeled graphics, and while clicking randomly throughout the interface does bring up dialogs which you can navigate through a little more easily with the mouse cursor, you still have no clue what you're doing in its main window. Without even having labels it's impossible to verify, split, and edit phonetic representations to your heart's content. The program also mentions alternatively importing wav files with phoneme markers, but I've yet to find out how that works. If it's going to become a laborious matter of specifying certain information in a text file, I'm willing to undertake this, but I'll wait til later to look into that.
There is also an advanced tts editor that strays from the basic one I described above. This allows you to use your speech recognition training data to record a voice, and then later to edit it in a much more fine way than you could with the simple creator. Because I am unable to create a training file, I& #039;ve not yet played with this, but assuming I can make it relatively painless, this is the way I would personally go to make a synthesized voice of myself. I would make sure every boundary is captured properly. Unfortunately I doubt this would be accessible, judging by what I'm seeing.
On the upside, all the recordings you make for your TTS voice source project are just wav files which have already been noise reduced and otherwise processed by the program. I've given some thought into recording some wav files with the program, verifying word boundaries, and then replacing the wavs with other copies of the same files which were not so pre-processed by the program before compilation to see if that would produce at least a cleaner sounding voice. As an audiophile and perfectionist the differences the pre-processing makes really bug me. I'm also looking into the possibility of manually editing boundary data and creating a voice which really is the best it can be, though I'm not sure what all that would entail at this stage, and I'm not even sure how possible that would be without vision.
I'm not trying to knock this program or anything, just warning those who want to get really geeky with it that you will have some hurdles to cross. The demo voices which come with the program are quite impressive I think, and I strive to match them in any voices I try to create. I hope this rant of sorts has brought a little helpful information to the table. If you seek to create your own voice, no matter how geeky or casual you want to go about it, I wish you the best of luck.

_______________________________________________
Audiogames-reflector mailing list
Audiogames-reflector@sabahattin-gucukoglu.com
https://sabahattin-gucukoglu.com/cgi-bin/mailman/listinfo/audiogames-reflector
  • ... AudioGames . net Forum — Off-topic room : raygrote via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : raygrote via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : raygrote via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : Socheat via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : ironcross32 via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : joshknnd1982 via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : ammericandad2005 via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : ironcross32 via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : Green Gables Fan via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : ammericandad2005 via Audiogames-reflector
    • ... AudioGames . net Forum — Off-topic room : blindndangerous via Audiogames-reflector

Reply via email to