As a follow-up, I just stumbled upon this article:
On 9/6/19 4:52 PM, Patric Schmitz wrote:
On 9/6/19 4:29 PM, Bruno Afonso wrote:
I'd love to hear if others have been using DNN for audio, I am a bit
more interested in DNN processing audio (ie, outputs processed audio)
than classic classification approaches where people are mostly
borrowing ideas from computer vision and classifying based on
spectrogram representations (think SFFT).
A former colleague is researching in this area. Particularly for the
transformation of singing voice emotion. Have a look at this recent paper.
They use a multi-layered recurrent LSTM network in what they call a
sequence-to-sequence architecture, that learns a latent space
representation of f0 contours conditioned on different emotions (anger,
Then there is WaveNet and many recent applications of it and extensions
to specific problem settings.> https://arxiv.org/abs/1609.03499
dupswapdrop: music-dsp mailing list