> -----Original Message----- > From: Stefano Sabatini <stefa...@gmail.com> > Sent: Montag, 28. April 2025 23:47 > To: softworkz . <softwo...@hotmail.com> > Cc: FFmpeg development discussions and patches <ffmpeg-devel@ffmpeg.org> > Subject: Re: [FFmpeg-devel] [PATCH v7 02/13] fftools/textformat: Apply quality > improvements > > On date Monday 2025-04-28 20:40:37 +0000, softworkz . wrote: > [...] > > > > > > /* validate replace string */ > > > > > > { > > > > > > - const uint8_t *p = tctx->string_validation_replacement; > > > > > > - const uint8_t *endp = p + strlen(p); > > > > > > + const uint8_t *p = (uint8_t *)tctx- > > > >string_validation_replacement; > > > > > > + const uint8_t *endp = p + strlen((const char *)p); > > > > > > while (*p) { > > > > > > const uint8_t *p0 = p; > > > > > > int32_t code; > > > > > > ret = av_utf8_decode(&code, &p, endp, tctx- > > > > > >string_validation_utf8_flags); > > > > > > if (ret < 0) { > > > > > > > > > > > AVBPrint bp; > > > > > > - av_bprint_init(&bp, 0, AV_BPRINT_SIZE_AUTOMATIC); > > > > > > + av_bprint_init(&bp, 0, AV_BPRINT_SIZE_UNLIMITED); > > > > > > bprint_bytes(&bp, p0, p - p0), > > > > > > av_log(tctx, AV_LOG_ERROR, > > > > > > "Invalid UTF8 sequence %s found in > string > > > > > validation replace '%s'\n", > > > > > > bp.str, tctx- > > > >string_validation_replacement); > > > > > > - return ret; > > > > > > + av_bprint_finalize(&bp, NULL); > > > > > > > > > > > > > I see no advantage in this, since it's adding dynamic allocation and > > > > > the need to free which was previously not needed > > > > > > > > The dynamic allocation does not happen until the buffer content size > > > > does not fit anymore into the stack-alloced memory. > > > > > > Yes, but is this ever happening? > > > > > Are we sure that it's not? Because unless we are 100% sure, we need > > to check the return value and when that check indicates an overflow > > of the buffer - what then? How to remedy? > > > > The check for the return value alone - even without any remediation - > > is more complicated code (more lines) than a single line with > > av_bprint_finalize(). > > > > That's my general rationale behind favoring _UNLIMITED. > > > > There are more cases where I have that, and where it's really relevant > > but for this one, I'll just remove the change, it's not worth the > > discussion. > > So I checked the code, a valid UTF8 sequence should be no longer than > six bytes, and reading the implementation of av_utf8_decode I could > not quickly find out the maximum possible size of an invalid sequence, > so I guess your argument has a point in case of crafted sequences - > although possibly we might compute the maximum size and it might be > small (assuming there are no mistakes in the implementation and no > regressions are introduced). Even in that case, the worst it might > happen would be that the sequence is truncated, so this should have no > security implications anyway - while allocation might fail (in case of > a very looong crafted string), leading again to truncation.
Yea, that aligns with my view. When I added the mermaid output I was really glad to find out that it fits fairly well into the sections logic, but the data that is in play is quite different in some cases, especially when it comes to the size of certain strings, and right away I ran into a bug due to a truncated string from an AVBprint with SIZE_AUTOMATIC. Which brought me to say: "Stop making assumptions". Making assumptions about potential max string lengths (like by looking at all call sites) is not a robust approach. Nobody else knows which assumptions you were making, or even more typical: at some point you might even forget about it and run into it yourself. 😃 Then I looked through all BPrints and there wasn't a single one where I could have said I'm 100% sure that it won't overflow (such cases exist of course - elsewhere). Having _UNLIMITED everywhere (in the AVTextFormat scope) also has advantages in maintenance: - It will always just work - No need to check return values of BPrint APIs - Every BPrint needs a finalize() call - always, without condition this is much easier to work with, and to maintain - less error-prone - The finalize() call is generally negligible unless it's a very special case What is even remaining on the pro-side for _SIZE_AUTOMATIC? - You save the finalize() call And that's it. Might sometimes be relevant or more elegant - yet often not. At the moment, all BPrint inits are _UNLIMITED already (in the scope of these APIs) and it's a good and safe thing IMO. > That said, I'm not against this change but it should be moved to a > dedicated change. Roger! Thanks sw _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".