Hi again,
I have now managed to build an initial draft of my Clustergen voice, with a
mere 500 prompts as its foundation. An example that I made can be found at:
https://dl.dropbox.com/u/5121962/birth_500.wav
Now, I have two questions:
1. For a Clustergen voice, is it true that the larger the dataset is the better
the result will be? Will I notice a great difference if I record 700 more
prompts in the same style as the ones I have, and then rebuild? Or are there
other factors that influence the quality more? SHould I be using clustering
based on individual vectors, or trajectories? The latter fails for me at
present, but I'm thinking that this might be because it needs more data
(regular vector clustering failed with 100 prompts).
2. As you can hear, the voice sounds very monotone and slightly too high in
pitch. My natural voice is about 30 or 40 hZ lower than that. I am trying to
figure out how I might change the average pitch, and the variations. I looked
at voicedir/etc/f0.params, and found some promising values there but even
though I changed these and rebuilt everything after the step that says:
./bin/do_clustergen f0
Everything after this point, I reran and my f0.params file did not get
overridden. But yet, there is no change in the output. Should I be doing
something differently?
Kind regards,
Philip Bennefall
_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk