[festival-talk] Five PhD studentships in Speech Technology (Centre for Speech Technology Research, University of Edinburgh)

Junichi Yamagishi Tue, 01 Nov 2011 05:04:27 -0700

message from Junichi Yamagishi <[email protected]> to festival-talk
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
Hi all,

Centre for Speech Technology Research, University of Edinburgh has five PhD
studentships in Speech Technology.
Please, kindly distribute this to any interested candidates.

Best Regards,
Dr. Junichi Yamagishi

-----------

Five fully-funded PhD studentships in speech and language processing are
available at the Centre for Speech Technology Research, University of
Edinburgh. The expected starting date for these studentships is September 2012.

One PhD studentship is supported by the EPSRC Natural Speech Technology
Project, three are supported by the JST CREST uDialogue Project, and one will
be supported by industrial funding.

The projects cover a wide variety of topics in the areas of speech synthesis,
speech recognition, language modelling, and spoken dialogue processing. They
all include exciting opportunities to work with our other project partners, in
the UK or Japan.

Speech synthesis topics include reactive statistical parametric speech
synthesis in which various conversational acoustic and verbal cues are
controllable, new acoustic models for statistical parametric speech synthesis
inspired by recent innovations such as subspace modelling and deep learning,
prosody modelling beyond the sentence for audio book tasks, and expressive
speech synthesis.

Speech recognition topics are based on multi-lingual speech recognition, in
particular language modelling approaches that could share parameters across
languages.

Spoken dialogue processing topics include implicit spoken dialogue systems that
do not require the full attention of the user and that learn when to intervene
in a conversation, structural learning of spoken dialogue contents, and
crowd-sourcing for learning dialogue content.

Full descriptions of the topics can be found below.

Suitable candidates will have a good first degree in a suitable discipline and
a strong interest in speech processing, machine learning, statistics, cognitive
science, linguistics, informatics, engineering, mathematics, or related area.
A relevant Masters' degree is desirable but not essential.

Potential applicants are encouraged to contact Simon King, Steve Renals, or
Junichi Yamagishi to discuss the topics. Contact details can be found
athttp://www.cstr.ed.ac.uk/people . For information on the formal application
process, please see
http://www.cstr.ed.ac.uk/opportunities/phd.htmlandhttp://www.ed.ac.uk/schools-departments/informatics/postgraduate/apply/overview
.

All topics are flexible and we welcome applicants with their own original ideas.

The anticipated start date is September 2012 but this is also flexible; earlier
start dates are possible.

Applications submitted before 16 December 2011 are preferred.

-----------

DEEP ARCHITECTURES FOR STATISTICAL SPEECH SYNTHESIS

The goal of this project is to statistically model not only how speech sounds
but also how it is produced. This will be done by developing models with 'deep
architectures'.

We have already developed a two-layer time-series statistical model of speech
and have applied it to the joint modelling of spectral features and
articulatory features, including tongue movements captured using
electromagnetic articulography. Using this model, synthetic speech generated
can be explicitly controlled via articulation.

In order to incorporate additional knowledge to acoustic modelling of speech
and to increase the controllability of speech synthesis further, this project
will develop a deeply layered model, inspired by human speech production and
perception, with layers corresponding to not only articulatory features but
also other meaningful features such as vocal fold vibrations, via images of the
glottis.

Other layers which incorporate rich linguistic knowledge might be intrinsic to
the speech or speaker, such as dialect, or external factors such as the
signal-to-noise ratio, or perceptual masking effects, thus creating
synthesisers that can be controlled in response to the listening situation.
Such an approach raises a number of scientific questions, including how to
acquire and parameterise features for deep architectures, how to train the
models and structures between the layers, and how to represent and apply prior
knowledge.

This project will include an industrial internship.

Contact: Dr Junichi Yamagishi

-----------

REACTIVE CONVERSATIONAL SPEECH SYNTHESIS

The project will develop acoustic models of speech and methods for natural
language generation that reflect the various causes and cues in conversational
speech and allow greater control for speech synthesis.

Currently, statistical parametric speech synthesis is trained from speech data
that mainly comprises read news text sentences. However, speech synthesis in
dialogue systems requires a more conversational style. Synthetic speech created
from read-text models sounds completely quite unlike genuine conversational
speech in terms of acoustic properties, linguistic construction and
conversational markers which are produced in response to the conversation
partner. Conversational speech is not only more casually articulated but
contains many interesting effects such hesitations, prolongations, filled
pauses and so on. These are thought to assist the listener and lead to a more
effective conversational flow. To create such reactive conversational synthetic
speech, this project will consider both the acoustic and language models,
incorporating new factors that are currently missing from the read-text type
speech synthesisers.

This project is supported by Japan Science and Technology Agency (JST) and
includes an extended visit to the Nagoya Institute of Technology in Japan who
are our partners in the 'uDialogue' project, which is described below.

Contact: Dr Junichi Yamagishi or Prof Simon King

-----------

DISTRIBUTED LINGUISTIC REPRESENTATIONS FOR MULTILINGUAL SPOKEN LANGUAGE
PROCESSING

Until recently, the dominant language models for speech recognition were based
strongly on n-grams, in which probability models are built over a vocabulary of
words, resulting in very high dimensions (frequently 1 million or more). In
recent years there has been growing interest in models which use distributed
representations of words, for example latent semantic analysis language models
and neural network language models. The latter, in particular, have proven to
be very attractive. In this project we plan to explore models in which the
distributed representation is automatically learned, enabling words to embedded
in what may be considered a semantic space. We are particularly in
investigating approaches based on deep neural networks, hierarchical Bayes, and
on ideas from factorised language model approaches such as Model M.

We are interested in applying these ideas to language modelling for speech
recognition in a multilingual context. How can we bootstrap a language model
in a target language given an existing language model in a source language?
Can we use semantic or other distributed representations that are sharable
across languages?

Contact: Prof Steve Renals

-----------

IMPLICIT SPOKEN DIALOGUE SYSTEMS

Current dialogue systems do not interact very naturally - for example, they
often demand 100% of the users' attention and attempt to interpret and respond
to every utterance. In the long term we would like to develop spoken language
systems which learn how to interact in multiparty conversations more naturally,
and to develop systems which can learn from their own experience and from other
dialogues and dialogue systems. There are a number of potential PhD topics in
this direction.

(a) Human-computer dialogues can be improved if the computer is able to
interpret - and appropriately respond to - the social signals of the talkers,
the social context of the conversation, and the content of the conversation.
This requires developing approaches to automatically extract and recognise the
social signals (such as positivity/negativity, frustration, engagement) in a
conversation, and to infer the the social context of the conversation, based on
these signals, the words that are spoken, and any external metadata (such as
location, time of day, etc.)

(b) Automatic learning of the structure of dialogues, and using such structure
to enable new dialogue scenarios to be constructed from existing dialogues.
This project would aim to develop hierarchical representations of dialogues
based on automatically recognised dialogue acts, on patterns of speaker turns
and dialogue act usage, and on observed social signals. Possible applications
of such structuring would include: (i) the development of dialogue systems
which learn when they are being addressed, or when would be an appropriate
moment to make an utterance; and (ii) the development of new dialogue systems
via the automatic reuse of dialogue components based on structural similarity.

Contact: Prof Steve Renals or Dr Junichi Yamagishi

-----------

EXPRESSIVE SPEECH SYNTHESIS

Even within the last 5 years, rapid progress has been made in many areas of
speech synthesis and we now have systems which are often as intelligible as
natural speech and capable of acceptable naturalness in a very limited range of
applications based on "read text". However, prosodically interesting,
expressive and engaging speech synthesis remains a major challenge. This
project will build on our recent work in creating speech synthesis that is
controllable in terms of spectral characteristics, but focus on the prosodic
aspects of the synthetic speech. One goal is expressive synthesis for creating
audio books from text. A number of potential topics and directions of research
are possible in this area. Two of these are given below, but we welcome
applicants with other suggestions.

(a) Wide-context textual features for speech synthesis. Most current systems
ignore text features beyond the current sentence, but in a discourse there are
rich features waiting to be exploited. Whilst a full semantic analysis of
discourse remains very challenging, shallow processing techniques possibly
including unsupervised approaches, may discover sufficient features to
significantly improve the expressivity and appropriateness of synthetic speech
for specific tasks such as audiobook reading.

(b) Expressivity control. Whilst more expressive synthetic speech can be
simulated through the use of recordings of acted speech in the required style,
this offers no explicit control over the output. The goal of this topic would
be to develop a deeper and more structured model of `expressivity' which
enables external, parametric control over individual aspects of the output
speech, including its prosody.

This project is supported by the EPSRC Programme Grant 'Natural Speech
Technology', described below, and includes the opportunity to collaborate with
and visit our project partners at the Universities of Sheffield and Cambridge
and the possibility of testing these new approaches to expressive speech
synthesis in home-care and assistive communication aid applications.

Contact: Prof Simon King

-----------

About the uDialogue project

uDialogue is a joint project with the Nagoya Institute of Technology in Japan,
funded by the Japan Science and Technology Agency (JST). The overall goal of
uDialogue is the development of spoken dialogue systems based on user-generated
content, and the project contains research on speech synthesis, speech
recognition, and spoken dialogue. Each of the PhD studentships is of a 4-year
duration, and includes a 6 month internship at the Nagoya Institute of
Technology.

http://www.cstr.ed.ac.uk
http://www.sp.nitech.ac.jp

----------

About the Natural Speech Technology project

Natural Speech Technology (NST) is an EPSRC Programme Grant with the aim of
significantly advancing the state-of-the-art in speech technology by making it
more natural, approaching human levels of reliability, adaptability and
conversational richness. NST is a collaboration between the Centre for Speech
Technology Research (CSTR) at the University of Edinburgh, the Speech Group at
the University of Cambridge and The Speech and Hearing Research Group (SpandH),
University of Sheffield.

http://www.natural-speech-technology.org

----------

--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =
= University of Edinburgh's Festival Speech Synthesis System =
= http://festvox.org/festival Sent Via [email protected] =
= To unsubscribe mail [email protected] =
= = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = = =

_______________________________________________
Festlang-talk mailing list
[email protected]
https://lists.berlios.de/mailman/listinfo/festlang-talk

[festival-talk] Five PhD studentships in Speech Technology (Centre for Speech Technology Research, University of Edinburgh)

Reply via email to