Den ons 20 maj 2026 kl 20:40 skrev Nathan Hartman <[email protected]
>:

> On Wed, May 20, 2026 at 9:20 AM Branko Čibej <[email protected]> wrote:
>
>> On Wed, 20 May 2026, 14:40 Daniel Sahlberg, <[email protected]>
>> wrote:
>>
>>> Den ons 20 maj 2026 kl 10:55 skrev <[email protected]>:
>>> >
>>> > Author: rinrab
>>> > Date: Wed May 20 08:55:33 2026
>>> > New Revision: 1934426
>>> >
>>> > Log:
>>> > Use UTF-8 alignement for the 'author' column in the 'svn blame'
>>> command.
>>> >
>>> > * subversion/svn/blame-cmd.c
>>> >   (#include): Add svn_utf_private.h.
>>> >   (print_line_info): Call svn_utf__cstring_utf8_align_right() to
>>> >    prepare author.
>>> >
>>> > Modified:
>>> >    subversion/trunk/subversion/svn/blame-cmd.c
>>> >
>>> > Modified: subversion/trunk/subversion/svn/blame-cmd.c
>>> >
>>> ==============================================================================
>>> > --- subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:30:24
>>> 2026        (r1934425)
>>> > +++ subversion/trunk/subversion/svn/blame-cmd.c Wed May 20 08:55:33
>>> 2026        (r1934426)
>>> > @@ -24,6 +24,7 @@
>>> >
>>> >  /*** Includes. ***/
>>> >
>>> > +#include "private/svn_utf_private.h"
>>> >  #include "svn_client.h"
>>> >  #include "svn_error.h"
>>> >  #include "svn_dirent_uri.h"
>>> > @@ -150,8 +151,9 @@ print_line_info(svn_stream_t *out,
>>> >            time_stdout = "
>>>  -";
>>> >          }
>>> >
>>> > -      SVN_ERR(svn_stream_printf(out, pool, "%s %10s %s ", rev_str,
>>> > -                                author ? author : "         -",
>>> > +      SVN_ERR(svn_stream_printf(out, pool, "%s %s %s ", rev_str,
>>> > +                                svn_utf__cstring_utf8_align_right(
>>> > +                                    author ? author : "-", 10, pool),
>>> >                                  time_stdout));
>>>
>>> After this change the output of svn blame is different from before if
>>> there is a very long author name.
>>>
>>> I have tested with svn compiled about a month ago (the version in
>>> $PATH) and from a brand new (in ./subversion/svn). I have prepared a
>>> repo with a file where all lines are authored by "dsg" and the
>>> remaining by "averylongauthor" (15 characters, ASCII).
>>>
>>> This is my commit #2 by the long author:
>>> [[[
>>> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn proplist -v
>>> --revprop -r2 ../wc/foo
>>> Unversioned properties on revision 2:
>>>   svn:author
>>>     averylongauthor
>>>   svn:date
>>>     2026-05-20T11:52:35.534418Z
>>>   svn:log
>>>     Modify line 4
>>> ]]]
>>>
>>> Blame before the change above:
>>> [[[
>>> dsg@devi-25-01:~/svn_trunk3$ svn blame ../wc/foo
>>>      1        dsg 1
>>>      1        dsg 2
>>>      1        dsg 3
>>>      2 averylonga Line 4
>>>      1        dsg 5
>>>      1        dsg 6
>>>      1        dsg 7
>>>      1        dsg 8
>>>      1        dsg 9
>>> ]]]
>>> Author names are right adjusted but when overflowing, the first 10
>>> characters are displayed.
>>>
>>> Blame after the change above:
>>> [[[
>>> dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn blame ../wc/foo
>>>      1        dsg 1
>>>      1        dsg 2
>>>      1        dsg 3
>>>      2 longauthor Line 4
>>>      1        dsg 5
>>>      1        dsg 6
>>>      1        dsg 7
>>>      1        dsg 8
>>>      1        dsg 9
>>> ]]]
>>> Author names are right adjusted but when overflowing, the last 10
>>> characters are displayed.
>>>
>>> (I'm aware there are more instances of svn_stream_printf and I haven't
>>> analysed exactly which one is involved here).
>>>
>>> I think we need to keep the precision in the formatting string and use
>>> the _align_left version.
>>>
>>> Kind regards,
>>> Daniel
>>>
>>
>>
>> Agreed, this is a very breaking/broken change. Changes that affect
>> program output need to be discussed on list and tested. This comment caught
>> my attention:
>>
>
>
> I've lost track a little bit of what this change was related to. What's
> the motivation for changing the output format? (Not saying I agree or
> disagree, just trying to get context.)
>

If I understand the idea it was to ensure that an author name containing a
Unicode character which is encoded using more than one byte in UTF-8 is
correctly aligned in the output. Using the Swedish character A with two
dots (Ä, encoded as 0xC3, 0x84

With the old code, note how the author name for Line 6 is only showing 9
letters (but 10 bytes due to Ä encoded as two separate bytes):
[[[
dsg@devi-25-01:~/svn_trunk3$ svn blame ../wc/foo
     1        dsg 1
     1        dsg 2
     1        dsg 3
     2 averylonga Line 4
     1        dsg 5
     3 Äabcdefgh Line 6
     1        dsg 7
     1        dsg 8
     1        dsg 9
]]]
(The above should probably be view with a monospace font, there it is clear
that the text "Line 6" start one position to the left of all other lines).

With the change, the code detects that Ä only occupy a single "position"
and thus we can display an additional letter and in the author name and
(with a monospace font) the columns are perfectly aligned:
[[[
dsg@devi-25-01:~/svn_trunk3$ ./subversion/svn/svn blame ../wc/foo
     1        dsg 1
     1        dsg 2
     1        dsg 3
     2 longauthor Line 4
     1        dsg 5
     3 Äabcdefghi Line 6
     1        dsg 7
     1        dsg 8
     1        dsg 9
]]]

However the nicer columnar layout of course have a drawback if we are
counting bytes for start of the line. Previously `cut  -c19-` would extract
the contents of the file but now it would add an extra space preceeding
"Line 6". (With multiple double-byte UTF-8 characters, in Swedish ÅÄÖ would
be 6 bytes) we would have part of the author name extracted by cut).

TLDR; It all depends on if we want a pretty layout (which is probably a
design goal for svnbrowse) or if we want to ensure correct column width.

To make matters worse, Windows Terminal make a mess of more exotic Unicode
characters outputting blank space for the extra bytes so the columns are
messed up. (Some would probably argue that that is expected, let's not go
there.)


>
>
> + * Please note, there might be a little artifact when there is a wider
>> + * character, then the string won't be perfectly aligned.
>>
>>
>> If true, it implies that svn_utf8_width() or whatever the function is
>> called isn't returning correct results.
>>
>> I can't find the discussion about this now but I'd just note that
>> calculating the width of a Unicode string by only looking at individual
>> code points is not correct. Therefore, pruning away individual code points
>> without context in order to get a shorter string is not correct, either.
>> Some Unicode glyphs can use up to 5 code points.
>>
>
>
> I also remember a discussion from several years back. It might be the same
> one you're thinking of. AFK right now but I'll try to find it.
>
> In fact, I'm also confused about the column width and truncation after 10
> characters: I thought it starts with some column width and if a line is
> encountered which has a longer user name that doesn't fit, then the column
> width is increased for that line and all subsequent lines. (The rationale
> was, it's ugly, but better to be accurate than pretty.) Has that changed
> sometime in the last few years?
>

There is code to handle the revision number, see the calculation of
bb->rev_maxlength in blame_receiver. Which of course make any argument
above that you could use `cut` with a specific number of bytes to extract
the author or the file contents moot.

Cheers,
Daniel




>
>
> -- Brane
>>
>> Whoever sold us Unicode as a fixed-width encoding was running a pyramid
>> scheme. 😏
>>
>
>
> I have bigger complaints about it than just the pyramid scheme :-)
>
>

Reply via email to