message from Junichi Yamagishi <[email protected]> to festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Hi all, 

Centre for Speech Technology Research, University of Edinburgh has five PhD 
studentships in Speech Technology. 
Please, kindly distribute this to any interested candidates.

Best Regards,
Dr. Junichi Yamagishi 

-----------

Five fully-funded PhD studentships in speech and language processing are 
available at the Centre for Speech Technology Research, University of 
Edinburgh. The expected starting date for these studentships is September 2012.

One PhD studentship is supported by the EPSRC Natural Speech Technology 
Project, three are supported by the JST CREST uDialogue Project, and one will 
be supported by industrial funding. 

The projects cover a wide variety of topics in the areas of speech synthesis, 
speech recognition, language modelling, and spoken dialogue processing. They 
all include exciting opportunities to work with our other project partners, in 
the UK or Japan.

Speech synthesis topics include reactive statistical parametric speech 
synthesis in which various conversational acoustic and verbal cues are 
controllable, new acoustic models for statistical parametric speech synthesis 
inspired by recent innovations such as subspace modelling and deep learning, 
prosody modelling beyond the sentence for audio book tasks, and expressive 
speech synthesis.

Speech recognition topics are based on multi-lingual speech recognition, in 
particular language modelling approaches that could share parameters across 
languages.

Spoken dialogue processing topics include implicit spoken dialogue systems that 
do not require the full attention of the user and that learn when to intervene 
in a conversation, structural learning of spoken dialogue contents, and 
crowd-sourcing for learning dialogue content.

Full descriptions of the topics can be found below. 

Suitable candidates will have a good first degree in a suitable discipline and 
a strong interest in speech processing, machine learning, statistics, cognitive 
science, linguistics, informatics, engineering, mathematics, or related area.  
A relevant Masters' degree is desirable but not essential.

Potential applicants are encouraged to contact Simon King, Steve Renals, or 
Junichi Yamagishi to discuss the topics. Contact details can be found 
athttp://www.cstr.ed.ac.uk/people . For information on the formal application 
process, please see 
http://www.cstr.ed.ac.uk/opportunities/phd.htmlandhttp://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply/overview
 .


All topics are flexible and we welcome applicants with their own original ideas.

The anticipated start date is September 2012 but this is also flexible; earlier 
start dates are possible.  

Applications submitted before 16 December 2011 are preferred.



-----------

DEEP ARCHITECTURES FOR STATISTICAL SPEECH SYNTHESIS

The goal of this project is to statistically model not only how speech sounds 
but also how it is produced. This will be done by developing models with 'deep 
architectures'.

We have already developed a two-layer time-series statistical model of speech 
and have applied it to the joint modelling of spectral features and 
articulatory features, including tongue movements captured using 
electromagnetic articulography. Using this model, synthetic speech generated 
can be explicitly controlled via articulation.

In order to incorporate additional knowledge to acoustic modelling of speech 
and to increase the controllability of speech synthesis further, this project 
will develop a deeply layered model, inspired by human speech production and 
perception, with layers corresponding to not only articulatory features but 
also other meaningful features such as vocal fold vibrations, via images of the 
glottis.

Other layers which incorporate rich linguistic knowledge might be intrinsic to 
the speech or speaker, such as dialect, or external factors such as the 
signal-to-noise ratio, or perceptual masking effects, thus creating 
synthesisers that can be controlled in response to the listening situation. 
Such an approach raises a number of scientific questions, including how to 
acquire and parameterise features for deep architectures, how to train the 
models and structures between the layers, and how to represent and apply prior 
knowledge.

This project will include an industrial internship.

Contact: Dr Junichi Yamagishi

-----------

REACTIVE CONVERSATIONAL SPEECH SYNTHESIS

The project will develop acoustic models of speech and methods for natural 
language generation that reflect the various causes and cues in conversational 
speech and allow greater control for speech synthesis.

Currently, statistical parametric speech synthesis is trained from speech data 
that mainly comprises read news text sentences. However, speech synthesis in 
dialogue systems requires a more conversational style. Synthetic speech created 
from read-text models sounds completely quite unlike genuine conversational 
speech in terms of acoustic properties, linguistic construction and 
conversational markers which are produced in response to the conversation 
partner. Conversational speech is not only more casually articulated but 
contains many interesting effects such hesitations, prolongations, filled 
pauses and so on. These are thought to assist the listener and lead to a more 
effective conversational flow. To create such reactive conversational synthetic 
speech, this project will consider both the acoustic and language models, 
incorporating new factors that are currently missing from the read-text type 
speech synthesisers.

This project is supported by Japan Science and Technology Agency (JST) and 
includes an extended visit to the Nagoya Institute of Technology in Japan who 
are our partners in the 'uDialogue' project, which is described below.

Contact: Dr Junichi Yamagishi or Prof Simon King

-----------

DISTRIBUTED LINGUISTIC REPRESENTATIONS FOR MULTILINGUAL SPOKEN LANGUAGE 
PROCESSING

Until recently, the dominant language models for speech recognition were based 
strongly on n-grams, in which probability models are built over a vocabulary of 
words, resulting in very high dimensions (frequently 1 million or more).  In 
recent years there has been growing interest in models which use distributed 
representations of words, for example latent semantic analysis language models 
and neural network language models.  The latter, in particular, have proven to 
be very attractive.  In this project we plan to explore models in which the 
distributed representation is automatically learned, enabling words to embedded 
in what may be considered a semantic space. We are particularly in 
investigating approaches based on deep neural networks, hierarchical Bayes, and 
on ideas from factorised language model approaches such as Model M. 

We are interested in applying these ideas to language modelling for speech 
recognition in a multilingual context.  How can we bootstrap a language model 
in a target language given an existing language model in a source language?  
Can we use  semantic or other distributed representations that are sharable 
across languages?

This project is supported by Japan Science and Technology Agency (JST) and 
includes an extended visit to the Nagoya Institute of Technology in Japan who 
are our partners in the 'uDialogue' project, which is described below.

Contact: Prof Steve Renals

-----------

IMPLICIT SPOKEN DIALOGUE SYSTEMS

Current dialogue systems do not interact very naturally - for example, they 
often demand 100% of the users' attention and attempt to interpret and respond 
to every utterance.  In the long term we would like to develop spoken language 
systems which learn how to interact in multiparty conversations more naturally, 
and to develop systems which can learn from their own experience and from other 
dialogues and dialogue systems. There are a number of potential PhD topics in 
this direction.

(a)  Human-computer dialogues can be improved if the computer is able to 
interpret - and appropriately respond to - the social signals of the talkers, 
the social context of the conversation, and the content of the conversation.  
This requires developing approaches to automatically extract and recognise the 
social signals (such as positivity/negativity, frustration, engagement) in a 
conversation, and to infer the the social context of the conversation, based on 
these signals, the words that are spoken, and any external metadata (such as 
location, time of day, etc.)

(b) Automatic learning of the structure of dialogues, and using such structure 
to enable new dialogue scenarios to be constructed from existing dialogues.  
This project would aim to develop hierarchical representations of dialogues 
based on automatically recognised dialogue acts, on patterns of speaker turns 
and dialogue act usage, and on observed social signals.  Possible applications 
of such structuring would include: (i) the development of dialogue systems 
which learn when they are being addressed, or when would be an appropriate 
moment to make an utterance; and (ii)  the development of new dialogue systems 
via the automatic reuse of dialogue components based on structural similarity.

This project is supported by Japan Science and Technology Agency (JST) and 
includes an extended visit to the Nagoya Institute of Technology in Japan who 
are our partners in the 'uDialogue' project, which is described below.

Contact: Prof Steve Renals or Dr Junichi Yamagishi

-----------

EXPRESSIVE SPEECH SYNTHESIS

Even within the last 5 years, rapid progress has been made in many areas of 
speech synthesis and we now have systems which are often as intelligible as 
natural speech and capable of acceptable naturalness in a very limited range of 
applications based on "read text". However, prosodically interesting, 
expressive and engaging speech synthesis remains a major challenge. This 
project will build on our recent work in creating speech synthesis that is 
controllable in terms of spectral characteristics, but focus on the prosodic 
aspects of the synthetic speech. One goal is expressive synthesis for creating 
audio books from text. A number of potential topics and directions of research 
are possible in this area. Two of these are given below, but we welcome 
applicants with other suggestions.

(a) Wide-context textual features for speech synthesis. Most current systems 
ignore text features beyond the current sentence, but in a discourse there are 
rich features waiting to be exploited. Whilst a full semantic analysis of 
discourse remains very challenging, shallow processing techniques possibly 
including unsupervised approaches, may discover sufficient features to 
significantly improve the expressivity and appropriateness of synthetic speech 
for specific tasks such as audiobook reading.

(b) Expressivity control. Whilst more expressive synthetic speech can be 
simulated through the use of recordings of acted speech in the required style, 
this offers no explicit control over the output. The goal of this topic would 
be to develop a deeper and more structured model of `expressivity' which 
enables external, parametric control over individual aspects of the output 
speech, including its prosody.

This project is supported by the EPSRC Programme Grant 'Natural Speech 
Technology', described below, and includes the opportunity to collaborate with 
and visit our project partners at the Universities of Sheffield and Cambridge 
and the possibility of testing these new approaches to expressive speech 
synthesis in home-care and assistive communication aid applications.

Contact: Prof Simon King

-----------


About the uDialogue project

uDialogue is a joint project with the Nagoya Institute of Technology in Japan, 
funded by the Japan Science and Technology Agency (JST).  The overall goal of 
uDialogue is the development of spoken dialogue systems based on user-generated 
content, and the project contains research on speech synthesis, speech 
recognition, and spoken dialogue. Each of the PhD studentships is of a 4-year 
duration, and includes a 6 month internship at the Nagoya Institute of 
Technology.

http://www.cstr.ed.ac.uk
http://www.sp.nitech.ac.jp

---------- 


About the Natural Speech Technology project

Natural Speech Technology (NST) is an EPSRC Programme Grant with the aim of 
significantly advancing the state-of-the-art in speech technology by making it 
more natural, approaching human levels of reliability, adaptability and 
conversational richness. NST is a collaboration between the Centre for Speech 
Technology Research (CSTR) at the University of Edinburgh, the Speech Group at 
the University of Cambridge and The Speech and Hearing Research Group (SpandH), 
University of Sheffield.

http://www.natural-speech-technology.org


---------- 

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
=    University of Edinburgh's Festival Speech Synthesis System       =
= http://festvox.org/festival      Sent Via [email protected] =
=                           To unsubscribe mail [email protected] =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk

Reply via email to