Duane,

I believe Simon is using version 2 of Sphinx ( is an older version written in C ), there are newer versions, Sphinx 3 (written in C++ ) and Sphinx 4 ( written in Java ). There are various pros/cons to each. Ver2 is fast, can handle about 10 simultaneous streams on a P3-700, but it's accuracy is not as great. But if you are doing just digits there is a digit only dictionary. Version 3 & 4 are the only ones being actively developed.

Now I'm not familiar with ver2 but we use ver4. It is slower and requires alot more resources, but using just the digit dictionary, it hits about a 99.4% accuracy rate ( I could not locate ver2's numbers but if I recall it was around 94%). The acoustic model that they all used has a sampling of 200 unique voices ( was done by Texas Instruments ). We use a larger custom dictionary with about 300 words, and achieve about a 97% accuracy rate.

One thing to keep in mind, ver2 works the easiest with telco audio (16bit 8Mhz) and is easiest to implement. Ver4 wants 16bit 16Mhz and for our use we process it after it was recorded on a separate server and upsample the recorded audio to get it to work with version 4. We process various voice, accent combinations (lots of hispanic and asian accents) and yes with some people it can be challenging, but typically it is due to word pronunciation not the accent. For example, some accents have issues with saying the H, so THREE is pronounced TREE. But you can add these things to your dictionary as you discover it's shortcomings.

Hope that helps.

Mike

Duane wrote:
Simon P. Ditner wrote:
Tonight, Simon Ditner will be speaking about Voice Recognition using
Sphinx2 with Asterisk, and will be demonstrating his open source project
ZoIP.org, a bridge for text adventures and the telephone using
speech-to-text and text-to-speech.

I'm unfortunately not able to make it, but I've had a problem for a
while now, and not been able to come up with a solution for it.

I've been trying to come up with a method of allowing people to list
their do not call preference in our zone (e164.org), we store records in
NAPTR form, but numerous problems with DTMF detection and such have
caused too many issues from going forward with a telephone only solution.

How well does the speech-to-text stuff work (including handling
accents), if the only responses are digits?


--
Mike Ashton

Quality Track Intl

Ph:     647-722-2092 x 251
Cell:   416-527-4995
Fax:    416-352-6043

QTI CONFIDENTIAL AND PROPRIETARY INFORMATION

The contents of this material are confidential and proprietary to Quality Track 
 International, Inc.
and may not be reproduced, disclosed, distributed or used without the express 
permission of an authorized representative of QTI.
Use for any purpose or in any manner other than that expressly authorized is 
prohibited.
If you have received this communication in error, please immediately delete it 
and all copies, and promptly notify the sender.



begin:vcard
fn:Mike Ashton
n:Ashton;Mike
org:Quality Track Intl
adr:;;63 Kenpark Ave;Brmpton;ON;L6Z 3L4;Canada
email;internet:[EMAIL PROTECTED]
title:CTO
tel;work:905-840-4995
tel;cell:416-527-4995
x-mozilla-html:FALSE
url:http://www.QualityTrack.com
version:2.1
end:vcard

Reply via email to