On Mon, Jan 12, 2026 at 9:17 PM Branko Čibej <[email protected]> wrote:
> On 12. 1. 26 18:31, Timofei Zhakov wrote: > > On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]> wrote: > >> On 12. 1. 26 14:30, Timofei Zhakov wrote: >> >> On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej <[email protected]> wrote: >> >>> On 12. 1. 26 12:52, [email protected] wrote: >>> >>> Author: brane >>> Date: Mon Jan 12 11:52:07 2026 >>> New Revision: 1931262 >>> >>> Log: >>> Fix warnings in the new checksum code. >>> >>> * subversion/libsvn_subr/checksum.c >>> (svn_checksum): Remove unused variable. >>> * subversion/libsvn_subr/checksum_apr.c: Include <limits.h>. >>> (svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not blindly cast >>> or implicitly narrow the data length. Check the limits first. >>> >>> Modified: >>> subversion/trunk/subversion/libsvn_subr/checksum.c >>> subversion/trunk/subversion/libsvn_subr/checksum_apr.c >>> >>> Modified: subversion/trunk/subversion/libsvn_subr/checksum.c >>> ============================================================================== >>> --- subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan 12 >>> 11:51:21 2026 (r1931261) >>> +++ subversion/trunk/subversion/libsvn_subr/checksum.c Mon Jan 12 >>> 11:52:07 2026 (r1931262) >>> @@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum, >>> apr_size_t len, >>> apr_pool_t *pool) >>> { >>> - apr_sha1_ctx_t sha1_ctx; >>> - >>> SVN_ERR(validate_kind(kind)); >>> *checksum = svn_checksum_create(kind, pool); >>> >>> >>> Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c >>> ============================================================================== >>> --- subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan 12 >>> 11:51:21 2026 (r1931261) >>> +++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c Mon Jan 12 >>> 11:52:07 2026 (r1931262) >>> @@ -21,6 +21,8 @@ >>> * ==================================================================== >>> */ >>> >>> +#include <limits.h> >>> + >>> #include "svn_private_config.h" >>> #ifdef SVN_CHECKSUM_BACKEND_APR >>> >>> @@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest >>> apr_size_t len) >>> { >>> apr_sha1_ctx_t sha1_ctx; >>> + >>> + /* Do not blindly truncate the data length. */ >>> + SVN_ERR_ASSERT(len < UINT_MAX); >>> + >>> apr_sha1_init(&sha1_ctx); >>> apr_sha1_update(&sha1_ctx, data, (unsigned int)len); >>> apr_sha1_final(digest, &sha1_ctx); >>> @@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks >>> const void *data, >>> apr_size_t len) >>> { >>> - apr_sha1_update(&ctx->apr_ctx, data, len); >>> + /* Do not blindly truncate the data length. */ >>> + SVN_ERR_ASSERT(len < UINT_MAX); >>> + apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len); >>> return SVN_NO_ERROR; >>> } >>> >>> >>> Just a quick note, it's generally not a good idea to use unchecked >>> typecasts anywhere. In this case there was a borderline chance of >>> Subversion quietly computing invalid hashes. The assert at least removes >>> the "quiet" part, although I suspect the better solution is to call >>> apr_sha1_update in UINT_MAX-sized chunks until all data are consumed. >>> >> >> I agree. It would be great if APR was accepting size as size_t. But it's >> not a big deal. Anyways, I'm pretty sure the issue existed there before. >> >> Maybe this function could feed it blocks of UINT_MAX if it exceeds the >> limit. But I don't think we should bother to support such edge cases, >> because if it happens, the issue is on the caller side -- it should not >> allocate 4GB buffers in memory. >> >> >> >> Is that a fact? Sorry but these days 4gigs is laughably small in certain >> applications. Our API promises "apr_size_t" chunks which on 64-bit >> platforms means larger than 4 GiB. We can change our API or we can fix the >> bug. It doesn't matter if the bug was there before, we know it's there now. >> >> -- Brane >> >> > Okay, it makes sense. I made a patch to support that behaviour. What do > you think? > > [[[ > Index: subversion/libsvn_subr/checksum_apr.c > =================================================================== > --- subversion/libsvn_subr/checksum_apr.c (revision 1931267) > +++ subversion/libsvn_subr/checksum_apr.c (working copy) > @@ -91,12 +91,19 @@ > apr_size_t len) > { > apr_sha1_ctx_t sha1_ctx; > + const char *ptr = data; > > - /* Do not blindly truncate the data length. */ > - SVN_ERR_ASSERT(len < UINT_MAX); > + apr_sha1_init(&sha1_ctx); > > - apr_sha1_init(&sha1_ctx); > - apr_sha1_update(&sha1_ctx, data, (unsigned int)len); > + while (len > 0) > + { > + unsigned int block = (len < UINT_MAX) ? (unsigned int)len > + : UINT_MAX; > + apr_sha1_update(&sha1_ctx, ptr, block); > + len -= block; > + ptr += block; > + } > + > apr_sha1_final(digest, &sha1_ctx); > return SVN_NO_ERROR; > } > @@ -127,9 +134,17 @@ > const void *data, > apr_size_t len) > { > - /* Do not blindly truncate the data length. */ > - SVN_ERR_ASSERT(len < UINT_MAX); > - apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len); > + const char *ptr = data; > + > + while (len > 0) > + { > + unsigned int block = (len < UINT_MAX) ? (unsigned int)len > + : UINT_MAX; > + apr_sha1_update(&ctx->apr_ctx, ptr, block); > + len -= block; > + ptr += block; > + } > + > return SVN_NO_ERROR; > } > ]]] > > I don't really like casting 'void *' to 'char *' but it seems like the > only option. > > > That cast isn't a problem, it already happens implicitly when > apr_sha1_update is called; this is from apr_sha1.h: > > APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char *input, > unsigned int inputLen); > > Oh, good to know. > > It would be nice, or at least it would satisfy my sense of aesthetics, if > the loop wasn't copy-pasted twice ... I'd make a helper function that will > almost certainly be inlined. As a bonus, that helper function can take a > 'const char*' argument, then you don't need the cast nor a second variable. > Something like this? > > static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t len) > { > while (len > UINT_MAX) > { > apr_sha1_update(ctx, data, UINT_MAX); > len -= UINT_MAX; > data += UINT_MAX; > } > if (len > 0) > apr_sha1_update(ctx, data, (unsigned int)len); /* This cast is safe. */ > } > > > then just call update_sha1() with no casts instead of apr_sha1_update(). > Nice. > > I thought about making a function. It might be better for readability to have everything inlined rather than adding extra abstractions. But as I think now it is probably worth it. This is like a wrapper around APR which allows for bigger buffers. And again it removes the risk of copy-paste mistakes. It's also a good argument to utilise implicit cast to char-ptr during function invocation. Could you please elaborate why you are suggesting re-writing the loop in this way? I think it was much simpler before. I would prefer to have a single invocation of the function, because it assures that no matter what we'd get the same behaviour. Imagine mistyping parameters in the first one (inside of the loop) which would never be actually executed and then trying to figure out where that bug comes from (and this all will most likely happen while performing some unbounded operation which is also a bug). I see that it removes the ternary operator, but it was good in explaining what this code does; feed it UINT_MAX bytes or whatever amount we have. On the other hand, after your rewrite, two completely separate branches would be executed depending on whether it iterates UINT_MAX blocks or feeds the remaining. > I'm planning to do a similar thing to the bcrypt backend. A tiny note: > long is usually 32 bit even on 64 bit platforms (on Windows). > > > +1. Yes, I remembered later that ULONG == DWORD == not something I'd use > for object sizes on a 64-bit platform. Same problem as using unsigned int > for the length in APR. > > -- Brane > > -- Timofei Zhakov

