[developers] ASR - Initial Documentation & Plan

Ahmad Sayed Mon, 09 Aug 2010 09:42:17 -0700

Dears,

Kindly find below the data documentation and an initial plan for the ASR,
appreciate to hear your feedback in order to enhance a make it more readable
and friendly.
Also I need to highlight that all the data uploaded to arabisc project SVN
on sourceforge



Speech Recognition Project:

Overview:
The purpose of this project is to provide a foundation for Speech
Recognition for that Arabian Colloquial languages, and increase the ability
of the other contributor to provide their experience to this topic without a
need to dig into all the required technical knowledge.

HMM  model:
The Hidden Markov Model provided is not a models for words it is a model for
the triphones which is the combination of 3 phoneme and each phoneme is
considered the spoken voice  e.g.
السلام عليكم
would be A-s-a-l-a-m-o-A-l-I-k-o-m in case of one phoneme
in our case a-s, a-s+a, s-a+l, …. etc

This off course  increase the challenge in data collection because each data
we collect should be transcript to the phoneme.
This provide a more accurate result for speech recognition  with the cost of
having our model a bit corpus dependent but still better than have a model
for each word, specially considering a cross-word triphones.

Design approaches:
The propose system uses HTK (HMM Tool Kit), as the base tool for the
creating and updating and testing the models, also to build the language
model.
Our contribution will be a set of the tools and scripts to provide an ease
of the development and testing.

The contributor will be able to localize his changes in a specific location,
and run only the specific scripts related to his contribution.


Suggested case studies:
The following case studies to demonstrate my vision regarding the project.
Signal Processing:
Apply some algorithm for noise removal the contributor successfully apply
all data and run a script to update the model and test according to get the
results of the new changes by running a single script.

Language Modeling and Grammar Definition:
Change the current language model, and/or the grammar and apply the needed
scripts testing

Current Status:
The current status is we have about 1000 spoken sentence for a corpus around
100 sentence.
Some scripts available it still require some manual tasks in order to
complete a training-testing cycle.
Why Two projects are available the 8kh and 16kh?
unfortunately  HTK all spoken sentence should be in the same rate ( may be
there is a way to overcome this but I do not know till now), so I go with
16khz but up sampling degrade the performance so much, so I switched all my
work to 8khz and retest I found that downsampling did not affect the
performance, so I continue with the 8khz and this make me able to get any
data and downsample it and begin my testing, as 8khz is lowest sampling rate
used in all the voice related application.


Short Term Plan.
1 - Automate the remaining scripts in order to make the train-test cycle
automated, and reorganize it make it easier for the any contributor to get
familiar with, preferably to have a GUI Tool
2 - Provide an easy to use tool for adding syntax to corpus and wav files,
preferably to have a GUI Tool
3 - Collect as much data as we can syntax for the corpus to build a better
language model.
4 - Collect as much spoken sentence as we can in order to build a better
triphone model.
5 - Enhance the tied state scripts (linguistic  expert is needed here).
6 - Signal processing wizard are need in order to apply his magic on the
data we collect and test with.

Long Term Plan.
Long term plan involve in different paths and should not be in sequence.
1 - Integrated it to AT-SPI in order to control the desktop.
2 - Adding other Arabic colloquial training data.
3 - Collect as much data as we can this point still needed here too,

_______________________________________________
Developer mailing list
[email protected]
http://lists.arabeyes.org/mailman/listinfo/developer

[developers] ASR - Initial Documentation & Plan

رد على