Hi,

Just following up on this - I'm sorry I haven't been able to look at the proposed patchset myself quite in detail yet.

My prime concern is about the requests to have this merged into the upcoming 6.1 release; that's way too soon IMO.

These patches do change aspects of how these things behave, that have been working the same for a very long time, so there are all sorts of potential subtle breakage, or (incorrect or not) assumptions being broken, across libavcodec and its users.

On Sat, 4 Nov 2023, Derek Buitenhuis wrote:

Next, a quick breakdown of the AAC situation, in terms of both how this it is 
stored,
what we support, and the state of the ecosystem and types of files that exist:
   * 'Raw' ADTS streams have no way to store any of this. The best we can do is 
guess
     the pre-roll. We should not guess priming or end padding, as no matter 
what we do,
     it'll be wrong, and any value will be a cargo culted hack value.

I share this concern; all the various encoders I've seen have used a different amount of priming samples, so guessing it will be bound to be wrong in a lot of the cases.

   * MP4 - there are two places to store this metadata - one standard, and one 
proprietary
     Apple way. There are, separately, two ways to signal priming length when 
SBR is present.
      * MP4s may contain a user data box with 'iTunSMPB' which contains 
priming, pre-roll,
        and end padding data. We support reading only priming data from this at 
the moment,
        and we set skip samples based on this. This is 'iTunes style' metadata.
      * The standards compliant (read: non-iTunes) way is to use an edit list 
to trim the
        priming samples, and, opionally end padding. End padding may also be 
trimmed by reducing
        the sample duration of the last packet in the stts box. Pre-roll is 
store in the sgpb
        box with the 'roll', type, which signals the roll distance as a number 
of packets;
        for example, -1 indicates you should decode an discard the samples of 1 
packet before
        beginning plaback. Notably, this allows the sgpd box to also be use for 
video like
        periodic intra refresh H.264. libavformat does not current parse or 
export this info,
        but even if we did, converting number of packets to audio samples can 
get hairy.
          * Notably, since in MP4, the edit list represents the exact 
presentation-level info,
            when no edit list, or an edit list startiing at 0 is present, no 
samples, not even
            pre-roll should be trimmed - all players in the wild handle this 
properly, and it
            has been standard practice among streaming services for >10 years 
to not output
            the AAC frames representing priming samples at all (even if there 
is a small hit
            quality). This is what the patch at [0] is addressing.

FWIW, MP4 isn't the only container where this might be relevant; AAC is frequently used in muxes together with video in FLV and MKV and others as well.

In the case of FLV, I'm not aware of any metadata that signals how much to trim off, so essentially we can't do it by guessing. On the producing side, this is handled by shifting the timestamps so the audio track, which would be starting at -<delay>, ends up starting at 0, and the video track ends up starting at +<audiodelay> instead.

In that case, if we trim off the priming samples (based on a guess as that's all we have?), I guess that'd lead us to both tracks starting at +<delay> (i.e. not affecting sync). As long as it doesn't change sync, I guess it can be tolerable.

To avoid all these effects, producers of muxed files can work around this in many ways. For many years, I've been doing the trick of skipping the first <delay> samples of input to the audio encoder, so that after accounting for that, I have both audio and video tracks starting at 0.0, without the decoder needing to do anything - working the same across all players, good and bad.

If we suddenly start decoding such files with the audio track starting at +<delay>, I guess it'll be ok for sync, but it's a mildly surprising change, but hopefully any reasonable player based on libavcodec would still not freak out by it.

   * As noted above, I don't think we should apply any guessed priming to 
initial samples (pre-roll,
     or 'algorithmic delay, included). No other decoders or players do this, in 
the world, to my
     knowledge, and violating the principal of least surpise because we think 
we're slightly more
     correct isn't great. I also think trying to 'fix' raw ADTS is destined to 
always be a hack,
     an we shouldn't. YMMV. I'd like to hear views from others here. This would 
make the patch in
     [0] redundant.

Yes, with raw ADTS there's really no good way of getting this right, other than plain guessing, and there's no single universally correct guess AFAIK.

(And even if we have a qualified guess for the amount of encoder priming, we have even less knowledge about how much to trim off at the end, if we're aiming at proper gapless playback.)

For MP4 there's at least a couple ways of signalling it.

But also, given all of this, I think we need to deeply consider how we approach this, so we don't end up with something that only covers certain cases (and I am sure I forgot more cases). To that end, I do not think rushing to get a patchset that can change sync on all AAC files in existence into 6.1 is wise. Even when this does go in, it should be able to sit in master for a good long time before being in a release.

+1. This has the potential to be surprising in many different cases, and may need a bunch of follow up patches to sort out cases found later. It definitely should sit in git master for a some time before ending up in a release - not be slipped into 6.1 the week before the release.

// Martin

_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to