On 25. 5. 26 22:37, Timofei Zhakov wrote:
On Mon, May 25, 2026 at 9:04 PM Branko Čibej <[email protected]> wrote:

    I took another look at how 'svn blame' aligns its output.


    static svn_error_t *
    print_line_info(svn_stream_t *out,
                    svn_revnum_t revision,
                    const char *author,
                    const char *date,
                    const char *path,
                    svn_boolean_t verbose,
                    int rev_maxlength,
                    apr_pool_t *pool)
    {
      const char *time_utf8;
      const char *time_stdout;
      const char *rev_str;

      rev_str = SVN_IS_VALID_REVNUM(revision)
        ? apr_psprintf(pool, "%*ld", rev_maxlWith propeties oength,
    revision)
        : apr_psprintf(pool, "%*s", rev_maxlength, "-");

      if (verbose)
        {
          if (date)
            {
    SVN_ERR(svn_cl__time_cstring_to_human_cstring(&time_utf8,
      date, pool));
    SVN_ERR(svn_cmdline_cstring_from_utf8(&time_stdout, time_utf8,
    pool));

    Converts timestamp to locale encoding ...

            }
          else
            {
              /* ### This is a 44 characters long string. It assumes
    the current
                 format of svn_time_to_human_cstring and also 3 letter
                 abbreviations for the month and weekday names. 
    Else, the
                 line contents will be misaligned. */
              time_stdout = "              -";
            }

          SVN_ERR(svn_stream_printf(out, pool, "%s %10s %s ", rev_str,
                                    author ? author : "      -",
                                    time_stdout));

    But author remains in UTF-8? The author name is extracted from
    properties, I don't recall if we enforce UTF-8 in svn:author. I
    know that we do in svn:log.

          if (path)
            SVN_ERR(svn_stream_printf(out, pool, "%-14s ", path));

    And so does the path? The blame-receiver's docstring says nothing
    about that.

        }
      else
        {
          return svn_stream_printf(out, pool, "%s %10.10s ", rev_str,
                                   author ? author : "      -");
        }

      return SVN_NO_ERROR;
    }

    I guess most of the time, locale encoding is UTF-8 or some other
    Unicode format that's lossless. Otherwise I can't imagine how this
    could work correctly, in general.

    What am I missing?



I think all API should assume UTF-8 string (with certain exceptions like let's say the svn_utf.h itself).

However, the problem is what it'd actually do. Since both the path and the properties at the end are stored as binary blobs on the disk, they could technically be anything. But I assume if the path wasn't UTF-8/ASCII - then FSFS wouldn't parse them properly which would lead to a corrupted repository.

Path names in the repository are stored as UTF-8, sure. We can assume that they arrive at the callback in the same encoding. It's our code, so we can check.

On the other hand properties could be anything unless there are some specific enforcements as for example you say we have for svn:log. But I'm pretty sure it's safe to assume UTF-8 for them if they store text. If it wasn't UTF-8, we'd have problems when printing it to the console anyways because it converts encoding from UTF to locale encoding.

Ah, but 'blame' does not – or should not – convert everything from UTF-8 on output. Because 'blame' prints the file contents, and we don't know the encoding of the file contents other than guessing they're something that the locale encoding can handle. So we don't convert them at all. 'diff' has a similar issue, so does 'cat'.


-- Brane

Reply via email to