Hi guys,

I am looking for some advice on how to use a speech-to-text model with a
fpc program designed to teach reading of invented words composed from 8
brazilian portuguese phonemes (four consonants and fours vowels).

So, right now (
https://github.com/cpicanco/stimulus-control-sdl2/blob/hanna/src/sdl.app.audio.recorder.devices.pas)
the program uses SDL2 to record short 4-5s audio streams and save each
recording to a wav file using fpwavwriter. Each audio stream/file is
supposed to be a word spoken by a student during a recording/playback
session of a word presented on screen. The participant will click a button
to finish the session. Then, the program will start a speech-to-text
routine and give some feedback.

There will be two speech-to-text routines. The first one will be a human
transcription (nothing new here for me). The second one will be an IA
transcription.

I am looking for an approach to read the raw stream (or the saved file if
no direct stream support) and pass it to a speech IA model (for example,
whisper) and then return some text output for further processing.

Using python, Whisper Medium (multilanguage), I got some good
(although slow) results without any fine tuning. However, I am considering
using Transformers if any fine tuning turns out to be necessary.

So, in this context, what would be "the way to go" for using the final
model with free pascal? Calling a script with TProcess? Please, can you
shed some light on here?

Best regards,
R
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
https://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to