En/na abe ha escrit:
Hi Liviu,
If you need to calculate acoustic similarity, you can use dynamic time
warping (DTW, http://en.wikipedia.org/wiki/Dynamic_time_warping ), which
basically windows the speech from the target and test files, extracts
features for each frame, and aligns them in the best possible way. The
algorithm defines a cost for insertion and deletion of frames and the
similarity of the features, so the overall cost (or the cost normalized
for the length of the file) provides a good measure for difference.
There's other refinements to the algorithm.
Clam can do the feature extraction, but I'm not sure if there's the dtw
algorithm (I'm in the process of learning clam so I'm not an expert), so
that might be something you'll have to do yourself. I've used this
methodology for comparing accents (native speaker vs non-native speaker
reading the same sentences). Back then I used HTK for the feature
extraction and a perl script for the dtw.
Hope this helps and maybe stimulates more ideas for doing it with clam...
Abe, I found your explanation very interesting.
As you imagine, Clam does not have a DTW, though it would suit
quite well to the framework.
pau
Abe
David García Garzón wrote:
Hi, Liviu.
Do you mean speech recognition? Speaker recognition? Just sound
classification? Which is the concrete use case? I am not sure of what
you mean but your statement seems too general to be achieved.
Depending on your purpose you might be in a research bleeding edge,
specially if you go to the semantic level.
CLAM currently has no sound classification system but it has many of
the building blocks such systems use. That's better than starting from
scratch.
A new document we added to the wiki [1] gives you an overview on the
steps to get introduced into CLAM:
[1] http://clam.iua.upf.edu/wikis/clam/index.php/Approaching_CLAM
If you need more help, just ask.
On Tuesday 12 June 2007 16:44:20 Liviu Macoviciuc wrote:
I am a newbie to CLAM and I don' t understand much.
However, I need to write a program that says if 2 audio files are
distinct
For example, a file might contain a voice saying "I am John", and
another
file the same voice or another voice saying "I am Bill"
Can anybody help me to get started ?!
Best regards,
Liviu
_______________________________________________
CLAM mailing list
[email protected]
http://www.iua.upf.es/mtg/clam
_______________________________________________
CLAM mailing list
[email protected]
http://www.iua.upf.es/mtg/clam
_______________________________________________
CLAM mailing list
[email protected]
http://www.iua.upf.es/mtg/clam