Dears, Kindly find below the data documentation and an initial plan for the ASR, appreciate to hear your feedback in order to enhance a make it more readable and friendly. Also I need to highlight that all the data uploaded to arabisc project SVN on sourceforge
Speech Recognition Project: Overview: The purpose of this project is to provide a foundation for Speech Recognition for that Arabian Colloquial languages, and increase the ability of the other contributor to provide their experience to this topic without a need to dig into all the required technical knowledge. HMM model: The Hidden Markov Model provided is not a models for words it is a model for the triphones which is the combination of 3 phoneme and each phoneme is considered the spoken voice e.g. السلام عليكم would be A-s-a-l-a-m-o-A-l-I-k-o-m in case of one phoneme in our case a-s, a-s+a, s-a+l, …. etc This off course increase the challenge in data collection because each data we collect should be transcript to the phoneme. This provide a more accurate result for speech recognition with the cost of having our model a bit corpus dependent but still better than have a model for each word, specially considering a cross-word triphones. Design approaches: The propose system uses HTK (HMM Tool Kit), as the base tool for the creating and updating and testing the models, also to build the language model. Our contribution will be a set of the tools and scripts to provide an ease of the development and testing. The contributor will be able to localize his changes in a specific location, and run only the specific scripts related to his contribution. Suggested case studies: The following case studies to demonstrate my vision regarding the project. Signal Processing: Apply some algorithm for noise removal the contributor successfully apply all data and run a script to update the model and test according to get the results of the new changes by running a single script. Language Modeling and Grammar Definition: Change the current language model, and/or the grammar and apply the needed scripts testing Current Status: The current status is we have about 1000 spoken sentence for a corpus around 100 sentence. Some scripts available it still require some manual tasks in order to complete a training-testing cycle. Why Two projects are available the 8kh and 16kh? unfortunately HTK all spoken sentence should be in the same rate ( may be there is a way to overcome this but I do not know till now), so I go with 16khz but up sampling degrade the performance so much, so I switched all my work to 8khz and retest I found that downsampling did not affect the performance, so I continue with the 8khz and this make me able to get any data and downsample it and begin my testing, as 8khz is lowest sampling rate used in all the voice related application. Short Term Plan. 1 - Automate the remaining scripts in order to make the train-test cycle automated, and reorganize it make it easier for the any contributor to get familiar with, preferably to have a GUI Tool 2 - Provide an easy to use tool for adding syntax to corpus and wav files, preferably to have a GUI Tool 3 - Collect as much data as we can syntax for the corpus to build a better language model. 4 - Collect as much spoken sentence as we can in order to build a better triphone model. 5 - Enhance the tied state scripts (linguistic expert is needed here). 6 - Signal processing wizard are need in order to apply his magic on the data we collect and test with. Long Term Plan. Long term plan involve in different paths and should not be in sequence. 1 - Integrated it to AT-SPI in order to control the desktop. 2 - Adding other Arabic colloquial training data. 3 - Collect as much data as we can this point still needed here too,
_______________________________________________ Developer mailing list [email protected] http://lists.arabeyes.org/mailman/listinfo/developer

