On 2025-08-14 22:23, Rob Hallam wrote:
On Thu, 14 Aug 2025 at 22:15, Bernhard Döbler <program...@bardware.de> wrote:
yesterday, news made the round, that ffmpeg 8 is going to be released,
soon, and it will contain whisper, an AI software that can understand
spoken text and create subtitles.
Their github page https://github.com/ggml-org/whisper.cpp says they
offer a handful of models.
Model Disk Mem
tiny 75 MiB ~273 MB
base 142 MiB ~388 MB
small 466 MiB ~852 MB
medium 1.5 GiB ~2.1 GB
large 2.9 GiB ~3.9 GB
There is a commit [1] adding Whisper support [2]. As the docs note you
will need to provide a model.
How does this work? Will all of this be compiled into the ffmpeg binary?
--enable-whisper config option is added (default: no) [3] so up to
whoever compiles your binary and you provide the model.
[1]:
https://github.com/FFmpeg/FFmpeg/commit/13ce36fef98a3f4e6d8360c24d6b8434cbb8869b
[2]: https://ffmpeg.org/ffmpeg-filters.html#whisper-1
[3]:
https://github.com/FFmpeg/FFmpeg/blob/47c6af7d299c96b2e65f5f10526e0f34e00b23c8/configure#L339
Enlarging the question somewhat, is there existing AI that could be used
to process existing recordings that contain both speech and music, and
highlight or extract the areas, say by creating cut points, that contain
music?
Does anyone here know if this is possible?
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user
To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".