Re: [Mscore-developer] (GSOC 2016) Regarding the Virtual Singer project idea...

David Cuny Mon, 21 Mar 2016 12:05:18 -0700

Mentioning again that I'm not a developer, so take everything with a grain
of salt here:


> However, aside from the fact that it requires an internet connection (and
that might hinder
> some users), I am not sure about the juridical aspect of it (will it
remain free forever?).

I think it's fair to assume that at some point the web service will stop
being available.

> The open source version is definitely the easiest to exploit, but as I
lack feasibility
> knowledge, inputs on the matter are very much welcome.

The last time I looked into it, the Sinsy source didn't include the tools
needed to train the HMM. While those are available elsewhere, there is
obviously effort required to:

   * Learn the HMM training tools
   * Record a corpus in the target language
   * Train the HMM
   * Add the HMM to Sinsy

Taking the long-term view, having more than a single voice database is a
good thing (no matter what the language), so the maintenance cost of the
building voices is obviously a factor.


> Moreover, in the ideal project where the audio would be played during
> editing on MuseScore, how much can the delay affect the user's experience?

While sung playback during editing would be ideal, that may be a bit
difficult in practice.

Vocaloid can play single syllables during editing with no real delay, but
requires time to "compile" anything longer.

UTAU has what appears to be a fairly long delay when it constructs the
output since it has to call a number of programs to glue things together.

Band in a Box uses Sinsy, and you can hear MIDI playback during editing,
but calling Sinsy is a separate step, and generates the entire song.
Response time from the web service is fairly good - I've never been tempted
to to get coffee.


> How many settings would be necessary for it to work on Mscore, and if
> possible, which ones?

Before addressing this, I'll mention a few text-to-phoneme issues you might
have to deal with:

* Vocaloid had mixed British and American pronunciations in their
dictionary, which leads to bad results.
* Users need to be able to override dictionary values, obviously. Sinsy
allows this using square braces: http://sinsy.sp.nitech.ac.jp/reference.pdf
* Which phoneme system will you use? How can the users see the list of
phonemes?
* Can the users access the dictionary, to choose between options?
* For dictionary lookup, will you combine a word back from its syllables
"catalog", or lookup using the supplied hyphenation "cat-a-log"?
* What happens if the hyphenation is wrong, which often happens?
* What happens if the word isn't in the dictionary? Is there a fallback
algorithm?


The most important settings to control (IMNSHO) are:

* Minimum duration of note to get automatic vibrato;
* Percent of note to apply vibrato to; and
* How much legato to apply between notes, including under/overshoot

-- David


On Mon, Mar 21, 2016 at 3:25 AM, syrma <k.romai...@gmail.com> wrote:

> David Cuny wrote
> > Non-developer jumping in again.
> >
> > *Sinsy* supports English, and can be accessed via a web service. Send a
> > MusicXML file in, and get a .wav file back. For implementation details,
> > see:
> >
> > https://pypi.python.org/pypi/sinsy-cli/
> >
> > Since Sinsy says it works well with MuseScore's MusicXML (it says so on
> > the
> > Sinsy page), this is probably the simplest approach. Of course, it
> > requires
> > an internet connection.
>
> Thank you for the link!
> I mentioned briefly the web service, but I am not very confident about
> using
> it. I tried the service myself with some MusicXML files and I must say the
> results are impressive. However, aside from the fact that it requires an
> internet connection (and that might hinder some users), I am not sure about
> the juridical aspect of it (will it remain free forever?). The open source
> version is definitely the easiest to exploit, but as I lack feasibility
> knowledge, inputs on the matter are very much welcome.
> Moreover, in the ideal project where the audio would be played during
> editing on MuseScore, how much can the delay affect the user's experience?
> Especially since we tend to need a frequent preview even after very small
> editing. I will keep the idea in mind, though.
>
>
> Tobias Platen wrote
> > I'm currently working on an eSpeak fork with better singing support (no
> > external perl script needed) and also on an MBROLA replacement based on
> > WORLD. For a good singing synthesizer you have to combine multiple of
> > those programs.
>
> Indeed, I think one should make the best use out of existent projects to
> get
> better results. By the way I have been through your code on QTau (mostly
> the
> vconnect_synth part), and I wondered how far have you exactly gotten with
> using v.Connect-STAND? I have been quite interested in it lately, mainly
> because it seems we can get some good results out of it, but it seems
> overly
> buggy with anything that isn't Japanese, and there is little doc available.
> It took me a little while to get it to convert an Utau database on Linux
> (all thanks to your debian package), but I'm still struggling with it on
> Windows using Cadencii. I think a better configuration would be needed to
> get it to work without all this trouble (looking back, the way Sinsy
> compiled and worked so easily is probably a big plus)
>
> On another (closely related) subject, I have been discussing on IRC with
> Lasconic the kind of Virtual Singer project that would be the best for
> MuseScore, and I meant to ask the two of you how would you see it working?
> How many settings would be necessary for it to work on Mscore, and if
> possible, which ones? (assuming that ideally, it would let the user
> play/preview the song the way it is now possible to play notes) The ones I
> can think of, by looking at something like Cadencii, are the word
> dictionary
> (language?), the renderer/synthesiser (if we use more than one), ... There
> are also settings about the singing style (decay, accent, some settings for
> rising and falling movement), and some parameters to pass to World, but I
> am
> not sure I am making enough sense of all of them.
>
>
>
> --
> View this message in context:
> http://dev-list.musescore.org/GSOC-2016-Regarding-the-Virtual-Singer-project-idea-tp7579698p7579723.html
> Sent from the MuseScore Developer mailing list archive at Nabble.com.
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Mscore-developer mailing list
> Mscore-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mscore-developer
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140

_______________________________________________
Mscore-developer mailing list
Mscore-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mscore-developer

Re: [Mscore-developer] (GSOC 2016) Regarding the Virtual Singer project idea...

Reply via email to