As a follow-up, I just stumbled upon this article:

On 9/6/19 4:52 PM, Patric Schmitz wrote:
On 9/6/19 4:29 PM, Bruno Afonso wrote:
I'd love to hear if others have been using DNN for audio, I am a bit more interested in DNN processing audio (ie, outputs processed audio) than classic classification approaches where people are mostly borrowing ideas from computer vision and classifying based on spectrogram representations (think SFFT).

A former colleague is researching in this area. Particularly for the transformation of singing voice emotion. Have a look at this recent paper.

They use a multi-layered recurrent LSTM network in what they call a sequence-to-sequence architecture, that learns a latent space representation of f0 contours conditioned on different emotions (anger, fear, sadness..).

Then there is WaveNet and many recent applications of it and extensions to specific problem settings.>

dupswapdrop: music-dsp mailing list

Reply via email to