Re: I'm working on an article

Aaron Chantrill Sun, 16 Nov 2025 15:26:14 -0800

On 11/12/25 19:08, Jason J.G. White wrote:

On 12/11/25 10:17, Aaron Chantrill wrote:
I'm working on an article for Linux Magazine. For this article, I'minterested in talking about setting up speech dispatcher withdifferent text to speech engines, like Piper TTS or Coqui TTS. Thisis based on a question from this mailing list a couple of months ago.I'm hoping to start a series on accessibility issues while deepeningmy own understanding.
For screen reader users, minimizing audio latency is important.Unfortunately,
the neural network-based TTS systems, including Coqui and Piper, havea reputation for producing high latency. This is an important reasonwhy screen reader users tend not to use them.
I don't know whether this is improved if you have appropriate GPUprocessing for the neural network models. Piper was unusably slow onmy machine, but I didn't investigate deeply enough to find out whetherit was using the GPU.

Piper when run as a command line program is unusably slow because it hasto load the full onnx model every time you call it. My goal is to usepiper's built-in http server. This is the same way the oldermimic3-general.conf module worked. Of course, writing an http serverfront end that can hold a model in memory isn't that difficult, so ifother TTS programs don't include a web service, it shouldn't be thatdifficult to write one. Once the onnx model is loaded, Piper runs fasterthan real time (it takes longer to say the output than to generate it)even on a Raspberry Pi 3, so latency and GPU shouldn't be an issue, butrunning an additional web server as a service does introduce additionalcomplexity.


Thank you, Aaron

Re: I'm working on an article

Reply via email to