On 12/23/2024 11:51 PM, Ken Fallon wrote:
I don't think that would make a huge difference but looking at the diffs, I do
see a change from
ffmpeg -y -i hpr${ep_num}.wav -af loudnorm=I=-16:LRA=11:TP=-1.5 hpr${ep_num}
_norm.wav >> ${fname}_tmp.log 2>&1
This is good; the audio filter ("-af loudnorm=I=-16:LRA=11:TP=-1.5") applies
some fairly basic normalization to -16 LUFS. This is a good setting for a
podcast, and the prior episodes reflect that. It preserves the dynamic range
of the source while ensuring the average loudness is at a set level
to
ffmpeg -y -i hpr${ep_num}_sandwitch.wav -filter:a "dynaudnorm=p=0.9:s=5"
hpr${ep_num}_norm.wav >> ${fname}_tmp.log 2>&1
This is not good. This replaces the prior LUFS normalization with a dynamic
audio filter that tries to keep peaks at 90% (p=0.9) over an averaged 5-second
window (s=5). This explains exactly what I'm hearing: Why the intro music is
quiet for the first 5 seconds, then suddenly loud when the speaker is talking;
and why the quiet parts aren't quiet any more. Peak loudness != average
loudness, so this isn't really appropriate IMO.
This new audio filter is essentially a dynamic compressor. It makes all parts
of the audio loud even when the speaker is deliberately talking softer. A
side effect is that some portions now sound distorted. It reflects badly on
the submitter, because it makes them sound like they don't know how to master
audio.
I can see two ways to fix this:
1. Return to the prior audio filter, as there's nothing wrong with it. It
ensures that all episodes have the same average level while preserving their
dynamics. (It also works fine with stereo input, so I'm not sure why the
filter was changed to dynaudnorm.)
2. Fix the dynaudnorm filter by:
- change s=5 to s=15 (the default value) so that the changes ride out a
larger window
- Add r (targetrms) parameter so that an average loudless level is
attempted, not just smoothing out the peaks. The value ranges from 0.0 to 1.0
and I don't know what a good value might be, so this would need to be derived.
I strongly suggest going with #1.
To be honest I have no idea why that change was made, but I usually only make
changes when something broke. Do you think that would account for the change ?
I can't see how loudnorm would break anything...
I would like to automate as much as possible, while ensuring we keep the
manual quality control steps that prevent the majority of problems getting
through.
It should be possible; I'd suggest this workflow (which might already be the
case):
- Convert whatever the input is to .wav with the loudnorm=I=-16:LRA=11:TP=-1.5
filter
- Run ffmpeg once for every output target (ogg/mp3/etc.), with the previous
normalized .wav as input
- Remove all temp files
Can audiophile people please subscribe to, and report issues with audio in,
the future feed <https://hackerpublicradio.org/rss-future.php>. Once the file
has hit the main feed it's already too late to replace the file for anyone
that downloaded it.
I can't guarantee I'll always listen to the future feed as I only have the
opportunity to listen a few times a month, but I'm happy to continue pointing
out anything I hear that goes against best practices.
Anyone willing to help with the transformation scripts and documentation
should please create an account on our Git Repo https://repo.anhonesthost.net/
user/sign_up, and _email me 1:1 so I can approve your account_.
Done, emailed.
Several items need to be fixed in the workflow.
* Fix audio dynamics compression (this issue)
Reverting to the previous audio filter will address this.
* Transcripts are not working - due to conflicting dependencies
Hm... the current state of the art for this is openai whisper. I have it
working on windows but it's the only python I use on a regular basis so I'm
afraid I'm not the best to consult on dependency issues. I believe conda
solves this problem, so maybe creating a brand new environment just for
running whisper would work? And then "conda activate whisper / (do the thing)
/ conda deactivate" would work from a shell scripting standpoint?
* Some audio files (most recently flac) files come in without a duration -
why is this ?
I don't know offhand, but the resulting .wav normalized output will most
definitely have a duration, so maybe use that for duration instead of the
mangled FLAC input?
Thanks everyone for Volunteering - And remember if you sign up before 2025
with the offer code "I<3HPR", we will double the remuneration for new
Contributors.
A bargain at twice the price!
--
Jim Leonard ([email protected], @mobygamer)
Videos about vintage personal computing: https://youtube.com/TheOldskoolPC
A child borne of the home computer wars: https://trixter.oldskool.org/
You're all insane and trying to steal my magic bag!
_______________________________________________
Hpr mailing list
[email protected]
https://lists.hackerpublicradio.com/mailman/listinfo/hpr