Re: [on-asterisk] TAUG Meeting & Social, TONIGHT, Wednesday September 27th, 2006

Mike Ashton 27 Sep 2006 16:55:55 -0000

Duane,

I believe Simon is using version 2 of Sphinx ( is an older versionwritten in C ), there are newer versions, Sphinx 3 (written in C++ ) andSphinx 4 ( written in Java ). There are various pros/cons to each. Ver2is fast, can handle about 10 simultaneous streams on a P3-700, but it'saccuracy is not as great. But if you are doing just digits there is adigit only dictionary. Version 3 & 4 are the only ones being activelydeveloped.

Now I'm not familiar with ver2 but we use ver4. It is slower andrequires alot more resources, but using just the digit dictionary, ithits about a 99.4% accuracy rate ( I could not locate ver2's numbers butif I recall it was around 94%). The acoustic model that they all usedhas a sampling of 200 unique voices ( was done by Texas Instruments ).We use a larger custom dictionary with about 300 words, and achieveabout a 97% accuracy rate.

One thing to keep in mind, ver2 works the easiest with telco audio(16bit 8Mhz) and is easiest to implement. Ver4 wants 16bit 16Mhz and forour use we process it after it was recorded on a separate server andupsample the recorded audio to get it to work with version 4. We processvarious voice, accent combinations (lots of hispanic and asian accents)and yes with some people it can be challenging, but typically it is dueto word pronunciation not the accent. For example, some accents haveissues with saying the H, so THREE is pronounced TREE. But you can addthese things to your dictionary as you discover it's shortcomings.


Hope that helps.

Mike

Duane wrote:

Simon P. Ditner wrote:

Tonight, Simon Ditner will be speaking about Voice Recognition using
Sphinx2 with Asterisk, and will be demonstrating his open source project
ZoIP.org, a bridge for text adventures and the telephone using
speech-to-text and text-to-speech.


I'm unfortunately not able to make it, but I've had a problem for a
while now, and not been able to come up with a solution for it.

I've been trying to come up with a method of allowing people to list
their do not call preference in our zone (e164.org), we store records in
NAPTR form, but numerous problems with DTMF detection and such have
caused too many issues from going forward with a telephone only solution.

How well does the speech-to-text stuff work (including handling
accents), if the only responses are digits?


--
Mike Ashton

Quality Track Intl

Ph:     647-722-2092 x 251
Cell:   416-527-4995
Fax:    416-352-6043

QTI CONFIDENTIAL AND PROPRIETARY INFORMATION

The contents of this material are confidential and proprietary to Quality Track 
 International, Inc.
and may not be reproduced, disclosed, distributed or used without the express 
permission of an authorized representative of QTI.
Use for any purpose or in any manner other than that expressly authorized is 
prohibited.
If you have received this communication in error, please immediately delete it 
and all copies, and promptly notify the sender.

begin:vcard
fn:Mike Ashton
n:Ashton;Mike
org:Quality Track Intl
adr:;;63 Kenpark Ave;Brmpton;ON;L6Z 3L4;Canada
email;internet:[EMAIL PROTECTED]
title:CTO
tel;work:905-840-4995
tel;cell:416-527-4995
x-mozilla-html:FALSE
url:http://www.QualityTrack.com
version:2.1
end:vcard

Re: [on-asterisk] TAUG Meeting & Social, TONIGHT, Wednesday September 27th, 2006

Reply via email to