On 13. 1. 26 09:57, Timofei Zhakov wrote:
On Mon, Jan 12, 2026 at 9:17 PM Branko Čibej <[email protected]> wrote:

    On 12. 1. 26 18:31, Timofei Zhakov wrote:
    On Mon, Jan 12, 2026 at 3:29 PM Branko Čibej <[email protected]>
    wrote:

        On 12. 1. 26 14:30, Timofei Zhakov wrote:
        On Mon, Jan 12, 2026 at 1:22 PM Branko Čibej
        <[email protected]> wrote:

            On 12. 1. 26 12:52, [email protected] wrote:
            Author: brane
            Date: Mon Jan 12 11:52:07 2026
            New Revision: 1931262

            Log:
            Fix warnings in the new checksum code.

            * subversion/libsvn_subr/checksum.c
               (svn_checksum): Remove unused variable.
            * subversion/libsvn_subr/checksum_apr.c: Include <limits.h>.
               (svn_checksum__sha1, svn_checksum__sha1_ctx_update): Do not 
blindly cast
                or implicitly narrow the data length. Check the limits first.

            Modified:
                subversion/trunk/subversion/libsvn_subr/checksum.c
                subversion/trunk/subversion/libsvn_subr/checksum_apr.c

            Modified: subversion/trunk/subversion/libsvn_subr/checksum.c
            
==============================================================================
            --- subversion/trunk/subversion/libsvn_subr/checksum.c      Mon Jan 
12 11:51:21 2026        (r1931261)
            +++ subversion/trunk/subversion/libsvn_subr/checksum.c      Mon Jan 
12 11:52:07 2026        (r1931262)
            @@ -475,8 +475,6 @@ svn_checksum(svn_checksum_t **checksum,
                           apr_size_t len,
                           apr_pool_t *pool)
              {
            -  apr_sha1_ctx_t sha1_ctx;
            -
                SVN_ERR(validate_kind(kind));
                *checksum = svn_checksum_create(kind, pool);
            Modified: subversion/trunk/subversion/libsvn_subr/checksum_apr.c
            
==============================================================================
            --- subversion/trunk/subversion/libsvn_subr/checksum_apr.c  Mon Jan 
12 11:51:21 2026        (r1931261)
            +++ subversion/trunk/subversion/libsvn_subr/checksum_apr.c  Mon Jan 
12 11:52:07 2026        (r1931262)
            @@ -21,6 +21,8 @@
               * 
====================================================================
               */
+#include <limits.h>
            +
              #include "svn_private_config.h"
              #ifdef SVN_CHECKSUM_BACKEND_APR
@@ -89,6 +91,10 @@ svn_checksum__sha1(unsigned char *digest
                                 apr_size_t len)
              {
                apr_sha1_ctx_t sha1_ctx;
            +
            +  /* Do not blindly truncate the data length. */
            +  SVN_ERR_ASSERT(len < UINT_MAX);
            +
                apr_sha1_init(&sha1_ctx);
                apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
                apr_sha1_final(digest, &sha1_ctx);
            @@ -121,7 +127,9 @@ svn_checksum__sha1_ctx_update(svn_checks
                                            const void *data,
                                            apr_size_t len)
              {
            -  apr_sha1_update(&ctx->apr_ctx, data, len);
            +  /* Do not blindly truncate the data length. */
            +  SVN_ERR_ASSERT(len < UINT_MAX);
            +  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
                return SVN_NO_ERROR;
              }

            Just a quick note, it's generally not a good idea to use
            unchecked typecasts anywhere. In this case there was a
            borderline chance of Subversion quietly computing
            invalid hashes. The assert at least removes the "quiet"
            part, although I suspect the better solution is to call
            apr_sha1_update in UINT_MAX-sized chunks until all data
            are consumed.


        I agree. It would be great if APR was accepting size as
        size_t. But it's not a big deal. Anyways, I'm pretty sure
        the issue existed there before.

        Maybe this function could feed it blocks of UINT_MAX if it
        exceeds the limit. But I don't think we should bother to
        support such edge cases, because if it happens, the issue is
        on the caller side -- it should not allocate 4GB buffers in
        memory.


        Is that a fact? Sorry but these days 4gigs is laughably small
        in certain applications. Our API promises "apr_size_t" chunks
        which on 64-bit platforms means larger than 4 GiB. We can
        change our API or we can fix the bug. It doesn't matter if
        the bug was there before, we know it's there now.

        -- Brane


    Okay, it makes sense. I made a patch to support that behaviour.
    What do you think?

    [[[
    Index: subversion/libsvn_subr/checksum_apr.c
    ===================================================================
    --- subversion/libsvn_subr/checksum_apr.c (revision 1931267)
    +++ subversion/libsvn_subr/checksum_apr.c (working copy)
    @@ -91,12 +91,19 @@
                        apr_size_t len)
     {
       apr_sha1_ctx_t sha1_ctx;
    +  const char *ptr = data;

    -  /* Do not blindly truncate the data length. */
    -  SVN_ERR_ASSERT(len < UINT_MAX);
    +  apr_sha1_init(&sha1_ctx);

    -  apr_sha1_init(&sha1_ctx);
    -  apr_sha1_update(&sha1_ctx, data, (unsigned int)len);
    +  while (len > 0)
    +    {
    +      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
    +                                            : UINT_MAX;
    +      apr_sha1_update(&sha1_ctx, ptr, block);
    +      len -= block;
    +      ptr += block;
    +    }
    +
       apr_sha1_final(digest, &sha1_ctx);
       return SVN_NO_ERROR;
     }
    @@ -127,9 +134,17 @@
                                   const void *data,
                                   apr_size_t len)
     {
    -  /* Do not blindly truncate the data length. */
    -  SVN_ERR_ASSERT(len < UINT_MAX);
    -  apr_sha1_update(&ctx->apr_ctx, data, (unsigned int)len);
    +  const char *ptr = data;
    +
    +  while (len > 0)
    +    {
    +      unsigned int block = (len < UINT_MAX) ? (unsigned int)len
    +                                            : UINT_MAX;
    +      apr_sha1_update(&ctx->apr_ctx, ptr, block);
    +      len -= block;
    +      ptr += block;
    +    }
    +
       return SVN_NO_ERROR;
     }
    ]]]

    I don't really like casting 'void *' to 'char *' but it seems
    like the only option.

    That cast isn't a problem, it already happens implicitly when
    apr_sha1_update is called; this is from apr_sha1.h:

    APU_DECLARE(void) apr_sha1_update(apr_sha1_ctx_t *context, const char 
*input,
                                     unsigned int inputLen);

Oh, good to know.

[...]


    It would be nice, or at least it would satisfy my sense of
    aesthetics, if the loop wasn't copy-pasted twice ... I'd make a
    helper function that will almost certainly be inlined. As a bonus,
    that helper function can take a 'const char*' argument, then you
    don't need the cast nor a second variable. Something like this?

    static void update_sha1(apr_sha1_ctx_t *ctx, const char* data, apr_size_t 
len)
    {
       while (len > UINT_MAX)
         {
           apr_sha1_update(ctx, data, UINT_MAX);
           len -= UINT_MAX;
           data += UINT_MAX;
         }
       if (len > 0)
         apr_sha1_update(ctx, data, (unsigned int)len);  /* This cast is safe. 
*/
    }


    then just call update_sha1() with no casts instead of
    apr_sha1_update(). Nice.


I thought about making a function. It might be better for readability to have everything inlined rather than adding extra abstractions. But as I think now it is probably worth it. This is like a wrapper around APR which allows for bigger buffers. And again it removes the risk of copy-paste mistakes.

It's also a good argument to utilise implicit cast to char-ptr during function invocation.

Could you please elaborate why you are suggesting re-writing the loop in this way? I think it was much simpler before. I would prefer to have a single invocation of the function, because it assures that no matter what we'd get the same behaviour. Imagine mistyping parameters in the first one (inside of the loop) which would never be actually executed and then trying to figure out where that bug comes from (and this all will most likely happen while performing some unbounded operation which is also a bug).

It gets rid of this

unsigned int block = (len < UINT_MAX) ? (unsigned int)len : UINT_MAX;


happening on every operation, something the compiler can't just ignore or optimise out. It also avoids

      len -= UINT_MAX;
      data += UINT_MAX;


when it isn't necessary; something that probably would be optimised out.

*shrug* Compared to the actual SHA-1 calculation, that's trivial. I just find it cleaner to not have to recalculate the block length every time. It's really a matter of taste.


I see that it removes the ternary operator, but it was good in explaining what this code does; feed it UINT_MAX bytes or whatever amount we have. On the other hand, after your rewrite, two completely separate branches would be executed depending on whether it iterates UINT_MAX blocks or feeds the remaining.

As I said, it's a matter of taste. I find it more obvious what's happening when the while() condition tests against an explicit, constant upper bound.




    I'm planning to do a similar thing to the bcrypt backend. A tiny
    note: long is usually 32 bit even on 64 bit platforms (on Windows).

    +1. Yes, I remembered later that ULONG == DWORD == not something
    I'd use for object sizes on a 64-bit platform. Same problem as
    using unsigned int for the length in APR.

    -- Brane



--
Timofei Zhakov

Reply via email to