Duane,
I believe Simon is using version 2 of Sphinx ( is an older version
written in C ), there are newer versions, Sphinx 3 (written in C++ ) and
Sphinx 4 ( written in Java ). There are various pros/cons to each. Ver2
is fast, can handle about 10 simultaneous streams on a P3-700, but it's
accuracy is not as great. But if you are doing just digits there is a
digit only dictionary. Version 3 & 4 are the only ones being actively
developed.
Now I'm not familiar with ver2 but we use ver4. It is slower and
requires alot more resources, but using just the digit dictionary, it
hits about a 99.4% accuracy rate ( I could not locate ver2's numbers but
if I recall it was around 94%). The acoustic model that they all used
has a sampling of 200 unique voices ( was done by Texas Instruments ).
We use a larger custom dictionary with about 300 words, and achieve
about a 97% accuracy rate.
One thing to keep in mind, ver2 works the easiest with telco audio
(16bit 8Mhz) and is easiest to implement. Ver4 wants 16bit 16Mhz and for
our use we process it after it was recorded on a separate server and
upsample the recorded audio to get it to work with version 4. We process
various voice, accent combinations (lots of hispanic and asian accents)
and yes with some people it can be challenging, but typically it is due
to word pronunciation not the accent. For example, some accents have
issues with saying the H, so THREE is pronounced TREE. But you can add
these things to your dictionary as you discover it's shortcomings.
Hope that helps.
Mike
Duane wrote:
Simon P. Ditner wrote:
Tonight, Simon Ditner will be speaking about Voice Recognition using
Sphinx2 with Asterisk, and will be demonstrating his open source project
ZoIP.org, a bridge for text adventures and the telephone using
speech-to-text and text-to-speech.
I'm unfortunately not able to make it, but I've had a problem for a
while now, and not been able to come up with a solution for it.
I've been trying to come up with a method of allowing people to list
their do not call preference in our zone (e164.org), we store records in
NAPTR form, but numerous problems with DTMF detection and such have
caused too many issues from going forward with a telephone only solution.
How well does the speech-to-text stuff work (including handling
accents), if the only responses are digits?
--
Mike Ashton
Quality Track Intl
Ph: 647-722-2092 x 251
Cell: 416-527-4995
Fax: 416-352-6043
QTI CONFIDENTIAL AND PROPRIETARY INFORMATION
The contents of this material are confidential and proprietary to Quality Track
International, Inc.
and may not be reproduced, disclosed, distributed or used without the express
permission of an authorized representative of QTI.
Use for any purpose or in any manner other than that expressly authorized is
prohibited.
If you have received this communication in error, please immediately delete it
and all copies, and promptly notify the sender.
begin:vcard
fn:Mike Ashton
n:Ashton;Mike
org:Quality Track Intl
adr:;;63 Kenpark Ave;Brmpton;ON;L6Z 3L4;Canada
email;internet:[EMAIL PROTECTED]
title:CTO
tel;work:905-840-4995
tel;cell:416-527-4995
x-mozilla-html:FALSE
url:http://www.QualityTrack.com
version:2.1
end:vcard