A new article is available in IPOL: https://www.ipol.im/pub/art/2022/427/

Sam Perochon,
A Presentation and Short Discussion of rVAD-fast, a Fast Voice Activity Detector,
Image Processing On Line, 12 (2022), pp. 404–419.
https://doi.org/10.5201/ipol.2022.427

Abstract
Voice activity detection (VAD) usually refers to the detection of human voices in acoustic signals and is often used as a pre-processing step in numerous audio signal processing tasks. The unsupervised method proposed here was originally developed by Zheng-Hua Tan, Achintya kr. Sarkar and Najim Dehak [Computer Speech & Language, 2020] and consists of a robust segment-based approach. The voice activity detection stage follows two denoising steps. The first one detects high energy segments using a posteriori SNR weighted energy difference, and the second enhances the speech using the MSNE-mod approach. Use cases or downstream tasks include intrusion detection, speech-to-text, speaker diarization, or emotion estimation.




--
IPOL - Image Processing On Line   - http://ipol.im/

contact     [email protected]          - http://www.ipol.im/meta/contact/
news+feeds  twitter @IPOL_journal - http://www.ipol.im/meta/feeds/
announces   [email protected] - http://tools.ipol.im/mm/announce/
discussions [email protected]  - http://tools.ipol.im/mm/discuss/
--
IPOL - Image Processing On Line   - http://ipol.im/

contact     [email protected]          - http://www.ipol.im/meta/contact/
news+feeds  twitter @IPOL_journal - http://www.ipol.im/meta/feeds/
announces   [email protected] - http://tools.ipol.im/mm/announce/
discussions [email protected]  - http://tools.ipol.im/mm/discuss/

Reply via email to