message from Christoph Nuscheler <[email protected]> to
festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Am 01.05.2012 17:24, schrieb Alan W Black:
Christoph Nuscheler wrote:
message from Christoph Nuscheler <[email protected]> to
festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Hi,
This is good news, thanks for doing this.
If you are happy we'd like to include this in the standard system
and replace the ancient SAPI code that is there ...
Fine by me. However, so far, I haven't really wrote any own SAPI code. I
just modified the old code by David Huggins-Daines of Cepstral so that
it would compile on Win7/VS2010.
Any suggestions? Perhaps someone can identify the bug by listening to
the MP3s... ;-)
So clearly rms and slt are at the wrong sample rate. But kal is 8KHz
and kal16 is 16Khz so that part is working, awb, rms and slt are 16KHz
which admittedly might not be ideal for some versions of Windows and
there may be some silly missing resampling going on in the Windows
drivers themselves.
That was my first idea, too. However, if you speed up the recordings by
a factor 2, it's the right speed again, but it still doesn't sound right.
I already tried to speed up the synthesis itself by dividing all the
values of sapi_ratetab_foo in FliteTTSEngineObj by 2. Again, right
speed, awkward sound. This is what the SAPI synthesis gave me:
http://www.student.uni-augsburg.de/~nuschech/speech-samples-rms-slt-2xRate.mp3
Slt pronounces "default" somewhat like "defaulaut"... I don't know that
to make of this.
FliteTTSEngineObj reads the sample rate of each voice via
get_param_int(curr_vox->features, "sample_rate", 16000);
However, it does make the assumption that all voices produce one channel
and 16 bits per sample. Is that correct for awb, rms and slt?
Did you generate all these in the order we are listening them? I wonder
if a sample rate value isn't being changed when yuo change voice.
The order in which I pick the voices doesn't matter. Rms and slt always
sound this wrong, even when I pick them first.
Do you get the same result is you play rms immediately after a reboot?
Doesn't make any difference, if I start the synthesis right after
installation, or if I do a reboot first.
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= University of Edinburgh's Festival Speech Synthesis System =
= http://festvox.org/festival Sent Via [email protected] =
= To unsubscribe mail [email protected] =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk