Re: [Mscore-developer] (GSOC 2016) Regarding the Virtual Singer project idea...

David Cuny Tue, 22 Mar 2016 12:20:06 -0700

Non-developer David again:

The issues of running UTAU (and UTAU-derived tools) under a Japanese locale
has been enough to keep me from trying it out.


> Making the user input phonetic symbols instead of actual lyrics is
> not a solution.

Sorry, I didn't mean to propose that. I just wanted to note that a fallback
that allowed phonetic symbols would be necessary.

As to the rest, my (unofficial) thought is that it currently takes quite a
bit of manual intervention to get English working well with the UTAU
toolchain, whether it uses VCV or CVVC. And each approach requires a
different set of tools to connect the samples together. It seems to me that
there's quite a bit of risk of not coming out with something usable at the
end.

-- David




On Tue, Mar 22, 2016 at 7:58 AM, syrma <k.romai...@gmail.com> wrote:

> Thank you for your reply.
>
> As for the playback, I also think that singing each note the moment we put
> it is impossible; we need to set the lyrics, and even then, the synthesis
> takes time. But getting it to play like in Cadencii would probably be good;
> to press play once everything is set, that is. Cadencii takes a while to do
> that though, and at some point, the time spent waiting for the synthesis is
> probably multiple times the time spent actually editing (that being said I
> think a lot of optimization is possible on Cadencii so it's probably not
> the
> best example).
>
> Leaving the questions about dictionaries for later, a side note about my
> struggles with v.Connect-STAND, Cadencii's synthesizing engine. I have
> finally been able to get some results out of it (by switching between my
> Linux and Windows every time one gets a problem). The rendering is more
> than
> decent in my opinion (although it depends a lot on the settings and the
> used
> voicebank, and it could sound worse than e-cantorix if not used properly
> (okay, not that bad, but still) ), and I think it is an interesting tool to
> use overall (some Utau users import their Utaus to v.Connect-stand to get a
> better rendering, but it is sometimes a little tricky). However, there are
> a
> few points that hinders direct use:
>
> - The Windows binaries won't work unless the system is as Japanese as
> possible, and while I don't know what is causing this yet (because I am not
> used to compiling on Windows), this needs a fix.
> - Encoding auto-detection is probably needed; even my Linux-built version
> needs a default input encoded as shift-jis (the typical encoding when
> dealing with files created by Japanese users on Windows). It supports other
> encodings, but the user needs to specify them.
> - The software takes a meta text sequence file (its own format), and
> outputs
> an audio. While I think implementing a conversion from a score to a meta
> text sequence would be sufficient for the first part of the project
> (generating the audio), optionally, I believe an optimization might be
> possible. As v.Connect's based on World (which implements real-time singing
> synthesis according to their introduction page), I am wondering whether
> changing the code to intercept the parameters before the audio is
> generated,
> and playing it in real-time would be possible. I have not dived into
> v.Connect's code far enough, so if someone who did thinks I am going a
> wrong
> and completely impossible way, please do let me know.
>
> A very interesting point in it however is its ability to convert and use
> Utau voicebanks, with the great amount of downlodable utaus on the net
> (let's forget for now about the mass of problems that alone causes). While
> looking for the possibility of using English with Utau voices, I came,
> among
> others, across this page : http://utau.wiki/cv-vc (see also:
> utau.wiki/tutorials:cvvc-english-tutorial-by-mystsaphyr ). This seems to be
> popular enough that a lot of utauloids use this method to simulate
> non-Japanese pronunciation. Namine Ritsu, a free voice for v.Connect-stand
> (and the most popular one), also has recordings of this kind, although the
> way English is rendered far from being perfect, and accents are all for the
> user to simulate. There are also (non-open source) plugins that can convert
> lyrics (or rather sequence files) from CVVC to VCV (another style used in
> utaus). Even though this allows for the user to get and add their sets of
> voice from the internet, I can easily think of a few issues one can come
> across:
>
> - Making the user input phonetic symbols instead of actual lyrics is not a
> solution. I think it may be possible to convert lyrics to espeak phonemes,
> and implement the remaining conversion step (that would depend on the
> voice). That gets us to another set of problems; the user would need to
> supply both the word and the hyphenation. And even then, some other
> problems
> are bound to happen, either because the word isn't in the dictionary or
> because the sound isn't available. In the first case, the user may need to
> provide the pronunciation (a proper noun for example). Beside this, should
> we let the user modify the pronunciation they want (after it is
> automatically generated) to simulate an accent or to make something sound
> more natural?
>
> - Encoding problems, always. Japanese on Windows is unpredictably tricky to
> deal with.
>
> - Voicebanks are usually recorded for a precise language. I could be wrong,
> but for now I don't see how we could detect the language unless the user
> specifies it. Also, some of the Japanese are only compatible with either
> romaji or kana (we could use kakasi to convert either the lyrics or the
> voicebank).
>
> Anyways, I don't think any amount of work of one summer would be enough to
> even think about all the issues (everything is so much more complicated
> than
> it first seems). The question would be, how much would make an acceptable
> project?
>
> The project I have in mind for now would be something like the following:
>
> - As a first step, taking care of the usability issues of v.Connect-stand,
> or ideally turning it into a usable library.
> - Implementing the generation of meta text sequences (it would be
> interesting to see how Cadencii, the open source C++/Qt editor, does it).
> This should include the processing of whatever settings we have (including
> phonemes) as this kind of files should provide all the information needed
> for synthesis.
> - Making a MuseScore plugin out of the two aforementioned items. This would
> include in addition:
>       - the front-end (collecting settings)
>       - the playback function
>
> Though I don't know if this is relevant to the current discussion (or at
> all), while looking for a good free voice data, I found Namine Ritsu's
> license is very unclear to me (the site the wiki pages link to for the use
> terms doesn't exist anymore). There is a separation between the character
> (visual art, profile, ...) and the voice resources. I suspect from the
> contradicting official information that it has changed over the time. The
> character itself seems to be the property of canon, but there doesn't seem
> to be any restrictions over the use of the voices. In addition, this
> voicebank (http://hal-the-cat.music.coocan.jp/ritsu_e.html) says it is
> released under the terms of GPLv3. I assume at least this voicebank is safe
> enough.
> [Unclear official material :
> - http://www.canon-voice.com/english/kiyaku.html (the English says
> something
> very unclear about the character but the voice is free)
> - http://canon-voice.com/ritsu.html ]
>
> So immediate questions are:
> - Is this a realistic and/or an acceptable project?
> - I am not aware of MuseScore plugin rules, so is such an approach alright?
> If not, what is the better way?
> - I am not sure where to integrate the second part, but I think the part to
> integrate into MuseScore should be as general as possible to add gradually
> support for other tools.
>
> Sorry for the long post. Please let me know your opinion, and whether I am
> analyzing things wrong!
>
>
>
> --
> View this message in context:
> http://dev-list.musescore.org/GSOC-2016-Regarding-the-Virtual-Singer-project-idea-tp7579698p7579737.html
> Sent from the MuseScore Developer mailing list archive at Nabble.com.
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
> _______________________________________________
> Mscore-developer mailing list
> Mscore-developer@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mscore-developer
>

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140

_______________________________________________
Mscore-developer mailing list
Mscore-developer@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/mscore-developer

Re: [Mscore-developer] (GSOC 2016) Regarding the Virtual Singer project idea...

Reply via email to