> -----Original Message-----
> From: Stefano Sabatini <stefa...@gmail.com>
> Sent: Montag, 28. April 2025 23:47
> To: softworkz . <softwo...@hotmail.com>
> Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org>
> Subject: Re: [FFmpeg-devel] [PATCH v7 02/13] fftools/textformat: Apply quality
> improvements
> 
> On date Monday 2025-04-28 20:40:37 +0000, softworkz . wrote:
> [...]
> > > > > >      /* validate replace string */
> > > > > >      {
> > > > > > -        const uint8_t *p = tctx->string_validation_replacement;
> > > > > > -        const uint8_t *endp = p + strlen(p);
> > > > > > +        const uint8_t *p = (uint8_t *)tctx-
> > > >string_validation_replacement;
> > > > > > +        const uint8_t *endp = p + strlen((const char *)p);
> > > > > >          while (*p) {
> > > > > >              const uint8_t *p0 = p;
> > > > > >              int32_t code;
> > > > > >              ret = av_utf8_decode(&code, &p, endp, tctx-
> > > > > >string_validation_utf8_flags);
> > > > > >              if (ret < 0) {
> > > > >
> > > > > >                  AVBPrint bp;
> > > > > > -                av_bprint_init(&bp, 0, AV_BPRINT_SIZE_AUTOMATIC);
> > > > > > +                av_bprint_init(&bp, 0, AV_BPRINT_SIZE_UNLIMITED);
> > > > > >                  bprint_bytes(&bp, p0, p - p0),
> > > > > >                      av_log(tctx, AV_LOG_ERROR,
> > > > > >                             "Invalid UTF8 sequence %s found in
> string
> > > > > validation replace '%s'\n",
> > > > > >                             bp.str, tctx-
> > > >string_validation_replacement);
> > > > > > -                return ret;
> > > > > > +                av_bprint_finalize(&bp, NULL);
> > > > >
> > >
> > > > > I see no advantage in this, since it's adding dynamic allocation and
> > > > > the need to free which was previously not needed
> > > >
> > > > The dynamic allocation does not happen until the buffer content size
> > > > does not fit anymore into the stack-alloced memory.
> > >
> > > Yes, but is this ever happening?
> >
> 
> > Are we sure that it's not? Because unless we are 100% sure, we need
> > to check the return value and when that check indicates an overflow
> > of the buffer - what then? How to remedy?
> >
> > The check for the return value alone - even without any remediation -
> > is more complicated code (more lines) than a single line with
> > av_bprint_finalize().
> >
> > That's my general rationale behind favoring _UNLIMITED.
> >
> > There are more cases where I have that, and where it's really relevant
> > but for this one, I'll just remove the change, it's not worth the
> > discussion.
> 
> So I checked the code, a valid UTF8 sequence should be no longer than
> six bytes, and reading the implementation of av_utf8_decode I could
> not quickly find out the maximum possible size of an invalid sequence,
> so I guess your argument has a point in case of crafted sequences -
> although possibly we might compute the maximum size and it might be
> small (assuming there are no mistakes in the implementation and no
> regressions are introduced). Even in that case, the worst it might
> happen would be that the sequence is truncated, so this should have no
> security implications anyway - while allocation might fail (in case of
> a very looong crafted string), leading again to truncation.

Yea, that aligns with my view. When I added the mermaid output I 
was really glad to find out that it fits fairly well into the sections
logic, but the data that is in play is quite different in some cases,
especially when it comes to the size of certain strings, and right
away I ran into a bug due to a truncated string from an AVBprint with 
SIZE_AUTOMATIC. Which brought me to say: "Stop making assumptions".
Making assumptions about potential max string lengths (like by looking
at all call sites) is not a robust approach. Nobody else knows which 
assumptions you were making, or even more typical: at some point 
you might even forget about it and run into it yourself. 😃 

Then I looked through all BPrints and there wasn't a single one where
I could have said I'm 100% sure that it won't overflow (such cases
exist of course - elsewhere).

Having _UNLIMITED everywhere (in the AVTextFormat scope) also has 
advantages in maintenance:

- It will always just work
- No need to check return values of BPrint APIs
- Every BPrint needs a finalize() call - always, without condition
  this is much easier to work with, and to maintain - less error-prone
- The finalize() call is generally negligible unless it's a very 
  special case

What is even remaining on the pro-side for _SIZE_AUTOMATIC?

- You save the finalize() call

And that's it. Might sometimes be relevant or more elegant - yet 
often not.

At the moment, all BPrint inits are _UNLIMITED already (in the scope
of these APIs) and it's a good and safe thing IMO.

> That said, I'm not against this change but it should be moved to a
> dedicated change.

Roger!

Thanks
sw
_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to