Hi again,

I have now managed to build an initial draft of my Clustergen voice, with a 
mere 500 prompts as its foundation. An example that I made can be found at:
https://dl.dropbox.com/u/5121962/birth_500.wav

Now, I have two questions:

1. For a Clustergen voice, is it true that the larger the dataset is the better 
the result will be? Will I notice a great difference if I record 700 more 
prompts in the same style as the ones I have, and then rebuild? Or are there 
other factors that influence the quality more? SHould I be using clustering 
based on individual vectors, or trajectories? The latter fails for me at 
present, but I'm thinking that this might be because it needs more data 
(regular vector clustering failed with 100 prompts).

2. As you can hear, the voice sounds very monotone and slightly too high in 
pitch. My natural voice is about 30 or 40 hZ lower than that. I am trying to 
figure out how I might change the average pitch, and the variations. I looked 
at voicedir/etc/f0.params, and found some promising values there but even 
though I changed these and rebuilt everything after the step that says:

./bin/do_clustergen f0

Everything after this point, I reran and my f0.params file did not get 
overridden. But yet, there is no change in the output. Should I be doing 
something differently?

Kind regards,

Philip Bennefall
_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk

Reply via email to