We are currently working on a voice processing application. In this 
application we want to perform speaker recognition from the audio file user 
upload to our backend. Here user is allowed to upload voice recording from our 
iOS / Android mobile app to the backend application. After receiving the audio, 
the backend application should compare the voice against existing voice samples 
of the user and identify whether users voice is there in the newly uploaded 
audio file. Here is the flow:

1. User create profile by entering email / mobile, password and upload his 4 
sample voice files. Each sample files could be 10-15 secs length. These samples 
voice can be used for comparison when user uploads his voice recordings 3. When 
user finish signup, application take user to home screen 4. User record a new 
speech from the application and upload it to server 5. Server should receive 
the file and validate it 6. After validation is done, it should verify whether 
user's voice is there in this audio file by comparing this against sample audio 
files that user uploaded at the time of signup 7. If user's voice is identified 
in the audio file, we should update it in the database that user's voice is 
found in it. Then upload the audio file in AWS S3 and send response back to 
mobile app    

All the registered users in our application should be able to upload their 
audio file and our backend should perform speaker recognition as mentioned 
above. We are expecting at least 80% accuracy while identifying user’s voice.   
We also tried to use Speaker Recognition API provided by Azure cloud. But the 
accuracy is really bad. We also tried Bob Bio Spear library. This library works 
fine with predefined sample audio files, but not with our audio files.

This requirement may look similar to Shazam, but not. In Shazam, the recorded 
music is compared exactly with the song they have in their database. So the 
voice, music should be exactly same as the one they have in their storage. But 
in our case, we want to compare the user’s voice. When a sample voice is 
recorded, user can speak any sentence / text. But at the time of uploading 
audio file, he would have recorded any kind of speech, which would be compared 
against his sample voice. Also there is a possibility of background noise as 
well. Here we have to compare only user’s voice, not what he speak. 

If you guys have any suggestion on this, please reply to this thread. If you 
are willing to work on this as a freelancer, please drop an email to 
vign...@infinodo.com <mailto:vign...@infinodo.com>. Thanks for your time. 

BangPypers mailing list

Reply via email to