message from Christoph Nuscheler <[email protected]> to 
festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Am 01.05.2012 17:24, schrieb Alan W Black:
Christoph Nuscheler wrote:
message from Christoph Nuscheler <[email protected]> to festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Hi,

This is good news, thanks for doing this.

If you are happy we'd like to include this in the standard system
and replace the ancient SAPI code that is there ...

Fine by me. However, so far, I haven't really wrote any own SAPI code. I just modified the old code by David Huggins-Daines of Cepstral so that it would compile on Win7/VS2010.

Any suggestions? Perhaps someone can identify the bug by listening to the MP3s... ;-)

So clearly rms and slt are at the wrong sample rate. But kal is 8KHz and kal16 is 16Khz so that part is working, awb, rms and slt are 16KHz which admittedly might not be ideal for some versions of Windows and there may be some silly missing resampling going on in the Windows drivers themselves.

That was my first idea, too. However, if you speed up the recordings by a factor 2, it's the right speed again, but it still doesn't sound right. I already tried to speed up the synthesis itself by dividing all the values of sapi_ratetab_foo in FliteTTSEngineObj by 2. Again, right speed, awkward sound. This is what the SAPI synthesis gave me:
http://www.student.uni-augsburg.de/~nuschech/speech-samples-rms-slt-2xRate.mp3

Slt pronounces "default" somewhat like "defaulaut"... I don't know that to make of this.

FliteTTSEngineObj reads the sample rate of each voice via get_param_int(curr_vox->features, "sample_rate", 16000); However, it does make the assumption that all voices produce one channel and 16 bits per sample. Is that correct for awb, rms and slt?

Did you generate all these in the order we are listening them?  I wonder
if a sample rate value isn't being changed when yuo change voice.

The order in which I pick the voices doesn't matter. Rms and slt always sound this wrong, even when I pick them first.

Do you get the same result is you play rms immediately after a reboot?

Doesn't make any difference, if I start the synthesis right after installation, or if I do a reboot first.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
=    University of Edinburgh's Festival Speech Synthesis System       =
= http://festvox.org/festival      Sent Via [email protected] =
=                           To unsubscribe mail [email protected] =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk

Reply via email to