Re: [FFmpeg-user] FFmpeg AAC encoder produces harsh noise on specific voice

Aditya Dandavate Sun, 10 Aug 2025 07:16:47 -0700

Oh yeah, make sure to use "-q:a 0" for best quality VBR.

On Sun, 10 Aug, 2025, 7:44 pm Aditya Dandavate, <adityadandavat...@gmail.com>
wrote:


> Dude, if its possible, I'd strongly recommend you to use Opus ( with
> libopus encoder ), even at 128k VBR, for most content, its quality is
> excellent, without a doubt, ( well you may need higher bitrate for complex
> audio ). But yeah, if you can't use Opus, libfdk-aac is recommended for
> better quality, and if you can't use both libfdk-aac and libopus, even
> libmp3lame MP3 encoder with "-compression_level 0" and "-cutoff 0" should
> give you very good quality.
>
> All the best 👍🏻
>
> On Sun, 10 Aug, 2025, 6:37 pm Agent 45, <jacka...@gmail.com> wrote:
>
>> Thank you for the detailed explanation and suggestions.
>> Changing either -aac_coder (to fast) or -aac_pns (to disable)
>> significantly
>> reduces the metallic noise in my tests, so I’ll continue experimenting
>> with
>> these options. If that still doesn’t give satisfactory results in some
>> cases, I’ll also try alternative encoders like libfdk_aac as you
>> suggested.
>>
>> I also hope posting on both the mailing list and code.ffmpeg.org hasn’t
>> caused any inconvenience — I only recently joined and wasn’t aware both
>> were still active.
>>
>> Anton Kapela <tkap...@gmail.com> 于2025年8月10日周日 05:45写道：
>>
>> > This phoneme in particular will probably always encode poorly on
>> simplistic
>> > AAC implementations like libavcodec. Why? More on that later.
>> >
>> > Others already suggested libfk_aac - and after testing that coder with
>> your
>> > samples, it's definitely the right choice (ie. sounds fine to me at
>> 128k).
>> > More deets on the Fraunhofer FDK AAC coder here:
>> > https://ffmpeg.org/ffmpeg-codecs.html#libfdk_005faac - and a sample of
>> its
>> > output at 128k using your "input2" source, attached.
>> >
>> > It's clear you've hit one of the many poorly handed corner cases of this
>> > AAC implementation. If you're curious why, read on.
>> >
>> > ---
>> >
>> > First, I'd recommend some experimentation: toggling the coder models
>> > available ("aac_coder'), and then also toggle aac_pns, aac_tns, aac_ltp;
>> > listen for whether the character of the error changes. Details here:
>> > https://ffmpeg.org/ffmpeg-codecs.html#aac
>> >
>> > As to why this signal is so badly represented by "twoloop:" we need to
>> > actually look at the signal we've encountered and understand what
>> > it represents. Interestingly, this particular sound presents a
>> relatively
>> > simple time domain character, but is rather complex in the frequency
>> > domain. What we have here is a textbook example of:
>> > https://en.wikipedia.org/wiki/Cyclostationary_process - mixed with a
>> > flavor
>> > of https://en.wikipedia.org/wiki/Frequency_comb - which, taken
>> together,
>> > present a unique problem for any block based MDCT codec scheme: to
>> > coherently describe the subtle time domain components of a strongly
>> > modulated signal, in a purely block-based frequency transformed domain.
>> >
>> > Let's examine this signals major features, looking at "input2" here,
>> since
>> > it's the longest and simplest example in your set:
>> >
>> > -the formant pitch is ~274 Hz
>> > -an in-phase high frequency burst occurs at *half* that frequency -
>> around
>> > ~137 bursts/sec, roughly one every 3.6 msec
>> > -the modulated burst is "ringing" around 4700 Hz
>> > -the formant and harmonics have a slow downwards frequency drift, along
>> > with short-term trills and warble
>> >
>> > This all adds up to create a situation in which high frequency bands are
>> > "sparse" in an absolute energy sense (relative to the formant pitch),
>> but
>> > which present ever-so-slight differences over short time scales (block
>> > lengths, even if dynamic, will never be in-phase with the signal
>> features).
>> > These prevent the twoloop algorithm from making *consistent*-sounding
>> > decisions, and why we hear swish/flutter/chirpy-noises at almost any
>> rate
>> > for signals of this type. Important decisions like "is this part of the
>> > signal a transient?" and "do these coefficients contain enough entropy
>> to
>> > matter?" or "should we substitute noise?" will radically alter the
>> > character of the reproduced signal, especially over the course of the
>> > signals' evolution.
>> >
>> > Why? Well, “twoloop” in FFmpeg’s native AAC encoder is a classic
>> > rate–distortion search and quantizer allocation scheme. It optimizes
>> > scalefactors per codebook, and across bands (two nested loops), on top
>> of
>> > FFmpeg’s psychoacoustic masking model. It then employs the usual AAC
>> tools
>> > (block switching, M/S and intensity stereo, PNS, and TNS) in its RD
>> loop.
>> > It does not implement high-band envelope detection nor cross-band
>> “carrier
>> > vs. envelope” tools like SBR/PS, or like we find in AC3. In contrast,
>> > libfdk-aac does—and employs a more complete hybrid, contextual
>> > psychoacoustic masking and ATH model. It also has support for the usual,
>> > more complex AAC profiles (HE-AAC v1/v2, ELD/LD), including an
>> > “afterburner” analysis-by-synthesis refinement. If one isn't using
>> HE-AACv2
>> > options, FDK still employs various refinements necessary to do the fancy
>> > stuff, even in LC operation.
>> >
>> > For comparison, I attached some 128k, 64k, and 48kbit AC3 encodes -
>> you'll
>> > hear how even this stone-age codec scheme makes better decisions, and
>> > degrades more gracefully, than the current twoloop AAC RD algorithm.
>> Here,
>> > the major contributing factor in AC3s ability to code this signal
>> "better"
>> > than twoloop AAC lies in its explicit use of "carrier
>> > precombination" (read:
>> >
>> >
>> https://www.fast-and-wide.com/images/stories/White_papers/ac3_multichannel_decoder.pdf
>> > )
>> > - which nicely handles cases like yours. This is possible by separating
>> the
>> > subband "carrier" signal from its "envelope" after input decomposition
>> by
>> > the filterbank. This has the audible effect of preserving interrelated
>> > time-domain features of the ~137 "tone bursts" per second in your
>> sample,
>> > while still providing coding gain vs. the source PCM.
>> >
>> > HTH,
>> >
>> > -Tk
>> >
>> >
>> > On Sat, Aug 9, 2025 at 4:26 AM Agent 45 <jacka...@gmail.com> wrote:
>> >
>> > > Hello, FFmpeg team,
>> > >
>> > > I'm encountering a consistent issue when encoding voice with FFmpeg
>> AAC
>> > > encoder.
>> > > At low and medium bitrates the encoded output contains noticeable and
>> > > sometimes harsh noise when encoding specific vocals.
>> > > These noise gradually reduce as the bitrate increases.
>> > >
>> > > I’ve attached all files (input and encoded outputs).
>> > > Here are the commands used, ffmpeg version 7.1.1:
>> > >
>> > > ffmpeg -i input1.wav -c:a aac -b:a 128k output1_128k.m4a
>> > > ffmpeg -i input2.wav -c:a aac -b:a 128k output2_128k.m4a
>> > > ffmpeg -i input3.wav -c:a aac -b:a 128k output3_128k.m4a
>> > >
>> > > ffmpeg -i input1.wav -c:a aac -b:a 192k output1_192k.m4a
>> > > ffmpeg -i input2.wav -c:a aac -b:a 192k output2_192k.m4a
>> > > ffmpeg -i input3.wav -c:a aac -b:a 192k output3_192k.m4a
>> > >
>> > > ffmpeg -i input1.wav -c:a aac -b:a 256k output1_256k.m4a
>> > > ffmpeg -i input2.wav -c:a aac -b:a 256k output2_256k.m4a
>> > > ffmpeg -i input3.wav -c:a aac -b:a 256k output3_256k.m4a
>> > >
>> > > # Observations:
>> > >
>> > > - All 128k versions contain harsh noise, and almost the same if
>> increase
>> > > the bitrate to 160k
>> > >
>> > > - `output1_192k.m4a`: noise at around 0.27s
>> > > - `output2_192k.m4a`: No obvious noise detected
>> > > - `output3_192k.m4a`: Mild noise at around 0.05s and some noise still
>> > > present from 0.3s
>> > >
>> > > - `output1_256k.m4a`: noise at around 0.27s
>> > > - `output2_256k.m4a`: No obvious noise detected
>> > > - `output3_256k.m4a`: Mild noise around at around 0.05s
>> > >
>> > > - No noise detected when increased to 320k
>> > > _______________________________________________
>> > > ffmpeg-user mailing list
>> > > ffmpeg-user@ffmpeg.org
>> > > https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>> > >
>> > > To unsubscribe, visit link above, or email
>> > > ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
>> > >
>> > _______________________________________________
>> > ffmpeg-user mailing list
>> > ffmpeg-user@ffmpeg.org
>> > https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>> >
>> > To unsubscribe, visit link above, or email
>> > ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
>> >
>> _______________________________________________
>> ffmpeg-user mailing list
>> ffmpeg-user@ffmpeg.org
>> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>>
>> To unsubscribe, visit link above, or email
>> ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
>>
>
_______________________________________________
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-user] FFmpeg AAC encoder produces harsh noise on specific voice

Reply via email to