Re: [FFmpeg-user] FFmpeg AAC encoder produces harsh noise on specific voice

Aditya Dandavate Sun, 10 Aug 2025 07:15:15 -0700

Dude, if its possible, I'd strongly recommend you to use Opus ( with
libopus encoder ), even at 128k VBR, for most content, its quality is
excellent, without a doubt, ( well you may need higher bitrate for complex
audio ). But yeah, if you can't use Opus, libfdk-aac is recommended for
better quality, and if you can't use both libfdk-aac and libopus, even
libmp3lame MP3 encoder with "-compression_level 0" and "-cutoff 0" should
give you very good quality.


All the best 👍🏻

On Sun, 10 Aug, 2025, 6:37 pm Agent 45, <jacka...@gmail.com> wrote:

> Thank you for the detailed explanation and suggestions.
> Changing either -aac_coder (to fast) or -aac_pns (to disable) significantly
> reduces the metallic noise in my tests, so I’ll continue experimenting with
> these options. If that still doesn’t give satisfactory results in some
> cases, I’ll also try alternative encoders like libfdk_aac as you suggested.
>
> I also hope posting on both the mailing list and code.ffmpeg.org hasn’t
> caused any inconvenience — I only recently joined and wasn’t aware both
> were still active.
>
> Anton Kapela <tkap...@gmail.com> 于2025年8月10日周日 05:45写道：
>
> > This phoneme in particular will probably always encode poorly on
> simplistic
> > AAC implementations like libavcodec. Why? More on that later.
> >
> > Others already suggested libfk_aac - and after testing that coder with
> your
> > samples, it's definitely the right choice (ie. sounds fine to me at
> 128k).
> > More deets on the Fraunhofer FDK AAC coder here:
> > https://ffmpeg.org/ffmpeg-codecs.html#libfdk_005faac - and a sample of
> its
> > output at 128k using your "input2" source, attached.
> >
> > It's clear you've hit one of the many poorly handed corner cases of this
> > AAC implementation. If you're curious why, read on.
> >
> > ---
> >
> > First, I'd recommend some experimentation: toggling the coder models
> > available ("aac_coder'), and then also toggle aac_pns, aac_tns, aac_ltp;
> > listen for whether the character of the error changes. Details here:
> > https://ffmpeg.org/ffmpeg-codecs.html#aac
> >
> > As to why this signal is so badly represented by "twoloop:" we need to
> > actually look at the signal we've encountered and understand what
> > it represents. Interestingly, this particular sound presents a relatively
> > simple time domain character, but is rather complex in the frequency
> > domain. What we have here is a textbook example of:
> > https://en.wikipedia.org/wiki/Cyclostationary_process - mixed with a
> > flavor
> > of https://en.wikipedia.org/wiki/Frequency_comb - which, taken together,
> > present a unique problem for any block based MDCT codec scheme: to
> > coherently describe the subtle time domain components of a strongly
> > modulated signal, in a purely block-based frequency transformed domain.
> >
> > Let's examine this signals major features, looking at "input2" here,
> since
> > it's the longest and simplest example in your set:
> >
> > -the formant pitch is ~274 Hz
> > -an in-phase high frequency burst occurs at *half* that frequency -
> around
> > ~137 bursts/sec, roughly one every 3.6 msec
> > -the modulated burst is "ringing" around 4700 Hz
> > -the formant and harmonics have a slow downwards frequency drift, along
> > with short-term trills and warble
> >
> > This all adds up to create a situation in which high frequency bands are
> > "sparse" in an absolute energy sense (relative to the formant pitch), but
> > which present ever-so-slight differences over short time scales (block
> > lengths, even if dynamic, will never be in-phase with the signal
> features).
> > These prevent the twoloop algorithm from making *consistent*-sounding
> > decisions, and why we hear swish/flutter/chirpy-noises at almost any rate
> > for signals of this type. Important decisions like "is this part of the
> > signal a transient?" and "do these coefficients contain enough entropy to
> > matter?" or "should we substitute noise?" will radically alter the
> > character of the reproduced signal, especially over the course of the
> > signals' evolution.
> >
> > Why? Well, “twoloop” in FFmpeg’s native AAC encoder is a classic
> > rate–distortion search and quantizer allocation scheme. It optimizes
> > scalefactors per codebook, and across bands (two nested loops), on top of
> > FFmpeg’s psychoacoustic masking model. It then employs the usual AAC
> tools
> > (block switching, M/S and intensity stereo, PNS, and TNS) in its RD loop.
> > It does not implement high-band envelope detection nor cross-band
> “carrier
> > vs. envelope” tools like SBR/PS, or like we find in AC3. In contrast,
> > libfdk-aac does—and employs a more complete hybrid, contextual
> > psychoacoustic masking and ATH model. It also has support for the usual,
> > more complex AAC profiles (HE-AAC v1/v2, ELD/LD), including an
> > “afterburner” analysis-by-synthesis refinement. If one isn't using
> HE-AACv2
> > options, FDK still employs various refinements necessary to do the fancy
> > stuff, even in LC operation.
> >
> > For comparison, I attached some 128k, 64k, and 48kbit AC3 encodes -
> you'll
> > hear how even this stone-age codec scheme makes better decisions, and
> > degrades more gracefully, than the current twoloop AAC RD algorithm.
> Here,
> > the major contributing factor in AC3s ability to code this signal
> "better"
> > than twoloop AAC lies in its explicit use of "carrier
> > precombination" (read:
> >
> >
> https://www.fast-and-wide.com/images/stories/White_papers/ac3_multichannel_decoder.pdf
> > )
> > - which nicely handles cases like yours. This is possible by separating
> the
> > subband "carrier" signal from its "envelope" after input decomposition by
> > the filterbank. This has the audible effect of preserving interrelated
> > time-domain features of the ~137 "tone bursts" per second in your sample,
> > while still providing coding gain vs. the source PCM.
> >
> > HTH,
> >
> > -Tk
> >
> >
> > On Sat, Aug 9, 2025 at 4:26 AM Agent 45 <jacka...@gmail.com> wrote:
> >
> > > Hello, FFmpeg team,
> > >
> > > I'm encountering a consistent issue when encoding voice with FFmpeg AAC
> > > encoder.
> > > At low and medium bitrates the encoded output contains noticeable and
> > > sometimes harsh noise when encoding specific vocals.
> > > These noise gradually reduce as the bitrate increases.
> > >
> > > I’ve attached all files (input and encoded outputs).
> > > Here are the commands used, ffmpeg version 7.1.1:
> > >
> > > ffmpeg -i input1.wav -c:a aac -b:a 128k output1_128k.m4a
> > > ffmpeg -i input2.wav -c:a aac -b:a 128k output2_128k.m4a
> > > ffmpeg -i input3.wav -c:a aac -b:a 128k output3_128k.m4a
> > >
> > > ffmpeg -i input1.wav -c:a aac -b:a 192k output1_192k.m4a
> > > ffmpeg -i input2.wav -c:a aac -b:a 192k output2_192k.m4a
> > > ffmpeg -i input3.wav -c:a aac -b:a 192k output3_192k.m4a
> > >
> > > ffmpeg -i input1.wav -c:a aac -b:a 256k output1_256k.m4a
> > > ffmpeg -i input2.wav -c:a aac -b:a 256k output2_256k.m4a
> > > ffmpeg -i input3.wav -c:a aac -b:a 256k output3_256k.m4a
> > >
> > > # Observations:
> > >
> > > - All 128k versions contain harsh noise, and almost the same if
> increase
> > > the bitrate to 160k
> > >
> > > - `output1_192k.m4a`: noise at around 0.27s
> > > - `output2_192k.m4a`: No obvious noise detected
> > > - `output3_192k.m4a`: Mild noise at around 0.05s and some noise still
> > > present from 0.3s
> > >
> > > - `output1_256k.m4a`: noise at around 0.27s
> > > - `output2_256k.m4a`: No obvious noise detected
> > > - `output3_256k.m4a`: Mild noise around at around 0.05s
> > >
> > > - No noise detected when increased to 320k
> > > _______________________________________________
> > > ffmpeg-user mailing list
> > > ffmpeg-user@ffmpeg.org
> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> > >
> > > To unsubscribe, visit link above, or email
> > > ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
> > >
> > _______________________________________________
> > ffmpeg-user mailing list
> > ffmpeg-user@ffmpeg.org
> > https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> >
> > To unsubscribe, visit link above, or email
> > ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
> >
> _______________________________________________
> ffmpeg-user mailing list
> ffmpeg-user@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
>
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-user] FFmpeg AAC encoder produces harsh noise on specific voice

Reply via email to