On Wed, May 20, 2026 at 1:52 PM Ivan Zhakov <[email protected]> wrote:

> On Tue, 19 May 2026 at 16:27, Timofei Zhakov <[email protected]> wrote:
>
>> On Mon, May 18, 2026 at 10:58 PM Timofei Zhakov <[email protected]>
>> wrote:
>> >
>> > Long time has passed since it was coded. So to refresh the context,
>> > whether you were following or not, the utf8-cmdline branch makes the
>> > command-line tools to first convert entire argv into UTF-8 and then
>> > assume normalised strings when handling parameters. A work that
>> > minimises the amount of potentially forgotten-to-convert strings and
>> > brings support for all Unicode symbols in the cmdline on Windows which
>> > is impossible right now.
>> >
>> > There is a test called basic_tests.py:argv_with_best_fit_chars. It
>> > checks that svn rejects Unicode symbols. Functionality which was
>> > illegal before changes introduced in that branch.
>> >
>> > I believe the correct solution would be to adjust test cases to expect
>> > successful execution instead of an error. I've tried to implement that
>> > before by changing arguments in run_and_verify_svn2, but it made me
>> > stuck... I kept getting an error message coming from the logger (?)
>> > that it struggles to convert output to print it.
>> >
>> > I really lack expertise with Python and svn-test to figure out the
>> > correct way to do it. Could anyone who knows it better than I do step
>> > ahead and maybe give it a fresh look from another perspective?
>> >
>> > It's basically the last thing that keeps us from getting this branch
>> > merged so far. I think it's time we finalise it.
>> >
>> > Alternatively I think we might remove that test as "outdated" because
>> > it checks an edge case which is now completely normal after the
>> > utf8ization. I would personally prefer the other option, but it's
>> > worth mentioning this possibility as well.
>> >
>> > Thanks!
>> > On Sat, May 16, 2026 at 3:06 PM Timofei Zhakov <[email protected]>
>> wrote:
>> > >
>> > > On Fri, May 15, 2026 at 4:02 PM Jun Omae <[email protected]> wrote:
>> > > >
>> > > > On 2026/05/15 0:28, Timofei Zhakov wrote:
>> > > > > There is a test called basic_tests.py:argv_with_best_fit_chars. It
>> > > > > checks that svn rejects Unicode symbols. Functionality which was
>> > > > > illegal before changes introduced in that branch.
>> > > >
>> > > > In the branch, svn command receives the arguments as utf-8 bytes,
>> but the
>> > > > output of the pipe is applied best-fit encoding conversion.
>> > > >
>> > > > [[[
>> > > > (diff is truncated)
>> > > > ]]]
>> > >
>> > > I tested this patch and can confirm that it works. I don't know why
>> > > but as far as I remember I was doing exactly the same thing, but it
>> > > didn't work for me.
>> > >
>> > > I remember I once heard that "everything looks like physics if you
>> > > don't know magic". That's exactly the case. Sometimes we just need a
>> > > pair of fresh eyes. :-)
>> > >
>> > > +1 for the changes
>> > >
>> > > > Recently, I'm trying 1.14.x with utf-8 code page using
>> activeCodePage
>> > > > manifest [1]. It almost works fine (e.g. add emoji filenames and
>> checkout,
>> > > > ...) however output to stderr is garbled and not fixed yet.
>> > > >
>> > > > [1]
>> https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
>> > >
>> > > That sounds interesting. If as you are saying output is converted to
>> > > the local encoding, it introduces a lot of inconsistency and yeah we
>> > > have no emojis.
>> > >
>> > > Since it's almost always that the encoding is UTF-8 on the majority of
>> > > Unix systems, I think it makes a lot of sense to take the same
>> > > approach on Windows.
>> >
>> > I saw you committed the patch in r1934334. Thank you so much!
>> >
>> > I guess the CI is back to green! I'll post an email on dev@ about
>> > merging the branch into trunk.
>>
>> Hey guys!
>>
>> I believe that it's time we merge the utf8-cmdline branch. I've
>> already written the merge log message and pasted it below. Hope there
>> is nothing missing because it ended up being pretty massive.
>>
>> +1.
>
> I agree that the change is massive. However, on trunk we're just at the
> beginning of the (1.16.x) release cycle, so it's probably the most
> appropriate moment of time to make a change like this.
>
> I have tested the branch on Windows: works great for me.
>
> I also reviewed the code and didn't find any problems. Please find some
> minor comments inline.
>
> [[[
>> Switch command-line programs to UTF-8 mode.
>>
>> Merging from branch utf8-cmdline-prototype.
>>
>> The Subversion library internally works with UTF-8 encoded strings which
>> simplifies it a lot.  However, commands-line programs receive arguments in
>> local encoding (cstring).  Before this change, we used to convert
>> encoding as
>>
>
> Nitpick: "local**e** encoding".
>
>
>> we handle each individual argument.  But this approach has several
>> limitations;
>>
>> 1)  It's easy to miss some conversions since we need to constantly keep
>> it in
>> mind that the arguments are in non-UTF-8 encoding, unlike the majority of
>> codebase assumes.  Let's keep it consistent and simple.
>>
>> 2)  Even though we could perfectly handle Non-ASCII (Unicode) strings in
>> UTF-8
>> encoding under the hood, cstring squishes them to allow ASCII-only data.
>> Switching command-line to UTF-8 would remove that layer of converting
>> input
>> data back and forth, which means direct conversions from UTF-16 to UTF-8
>> without any complications.
>>
>> Recently, we added a family of svn_cmdline__get_cstring_argv() function to
>> resolve an issue with inconsistent encoding of what we assumed was
>> cstring.
>> Now, it's time to get into UTF-8, so we are introducing
>> svn_cmdline__get_utf8_argv() for that.  It uses
>> svn_utf__win32_utf16_to_utf8()
>> on Windows, and svn_utf_cstring_to_utf8() on Unix.
>>
>> This branch should not bring any behaviour changes on Unix, while Windows
>> code
>> now supports Unicode command-line arguments.
>>
>> * subversion/include/private/svn_cmdline_private.h
>>   (svn_cmdline__win32_get_utf8_argv, svn_cmdline__default_get_utf8_argv):
>>    Declare methods.
>>   (svn_cmdline__get_utf8_argv): New define.
>>
>> * subversion/include/svn_client.h
>>   (svn_client_args_to_target_array3): Declare new functions that do not
>>    perform UTF-8 conversion unlike its old variant.
>>   (svn_client_args_to_target_array2): Deprecate.
>>
>> * subversion/include/svn_opt.h
>>   (svn_opt_args_to_target_array4, svn_opt_parse_revprop2): Declare new
>>    functions that do not perform UTF-8 conversion unlike their old
>> variant.
>>   (svn_opt_args_to_target_array3, svn_opt_parse_revprop): Deprecate.
>>
>> * subversion/libsvn_client/client.h
>>   (svn_client__process_target_array): Declare new private function to put
>>    common code of target collection to later use in either UTF8 or
>> non-UTF8
>>    versions.
>>
>> * subversion/libsvn_client/cmdline.c
>>   (svn_client__process_target_array): Implement function, using body of
>>    former svn_client_args_to_target_array2() function.
>>   (svn_client_args_to_target_array3): Implement by forwarding targets to
>>    svn_client__process_target_array() as they are without any conversions.
>>
>> * subversion/libsvn_client/deprecated.c
>>   (svn_client_args_to_target_array2): Implement backward compatibility
>>    function in a similar way as svn_client_args_to_target_array3(), but
>> also
>>    converting targets to UTF8 encoding.
>>
>> * subversion/libsvn_subr/cmdline.c
>>   (svn_cmdline__win32_get_utf8_argv): New function that receives a
>> collection
>>    of argv in local Windows encoding and converts them to UTF8 using
>>    svn_utf__win32_utf16_to_utf8.
>>   (svn_cmdline__default_get_utf8_argv): New function that ensures the
>> exact
>>    same behaviour in terms of encoding conversions performed as our
>>    command-line code used to do it before on Unix platforms since it is
>> backed
>>    by svn_utf_cstring_to_utf8().
>>
>> * subversion/libsvn_subr/deprecated.c
>>   (svn_opt_args_to_target_array3): Use svn_opt_parse_all_args() method to
>> back
>>    the implementation.
>>   (svn_opt_parse_revprop): Implement wrapper over svn_opt_parse_revprop2
>> with
>>    UTF8 conversion.
>>
>> * subversion/libsvn_subr/opt.c
>>   (svn_opt__args_to_target_array): New function implementation.
>>   (svn_opt_parse_revprop2): Bump version and remove UTF8 conversion.
>>
>> * subversion/svn/auth-cmd.c
>> * subversion/svn/propdel-cmd.c
>> * subversion/svn/propedit-cmd.c
>> * subversion/svn/propget-cmd.c
>> * subversion/svn/propset-cmd.c
>> * subversion/svn/shelf-cmd.c
>> * subversion/svn/shelf2-cmd.c
>>   (svn_cl__auth,
>>    svn_cl__changelist,
>>    svn_cl__propdel,
>>    svn_cl__propedit,
>>    svn_cl__propget,
>>    svn_cl__propedit,
>>    svn_cl__propget,
>>    svn_cl__propset,
>>    get_next_argument): Remove conversion to UTF8 encoding of target
>> arguments.
>>
>> * subversion/svn/svn.c
>>   (parse_compatible_version): Do not convert the encoding of value of the
>>    opt_compatible_version argument.
>>
>>   (sub_main): Use svn_cmdline__get_utf8_argv instead of
>>    svn_cmdline__get_cstring_argv to get normalized arguments.
>>   (sub_main): Read opt_arg directly to utf8_opt_arg and remove all
>> conversion
>>    from cstring to UTF8.
>>   (sub_main): For some options which want cstring, convert utf8_opt_arg
>> back.
>>    Those include --diff-cmd, --merge-cmd, and --editor-cmd.
>>   (sub_main): Use a newer version of svn_opt_parse_revprop,
>> svn_opt_parse_revprop
>>    which does not convert strings to UTF8.
>>   (sub_main): Do not utf8-ize first_arg.
>>   (sub_main): Do not convert the message to UTF8.
>>
>> * subversion/svn/util.c subversion/svn/util.c
>>   (svn_cl__make_log_msg_baton): Consider --encoding argument only when the
>>    message is read from a file, keeping it in UTF8 when --message is
>> utilised
>>    instead.
>>
>>   (svn_cl__args_to_target_array_print_reserved): Use
>>    svn_client_args_to_target_array3 instead of
>> svn_client_args_to_target_array2,
>>    which doesn't convert result to utf8, since we are working with already
>>    converted 'os'.
>>
>> * subversion/svnadmin/svnadmin.c
>>   (parse_args): No need to command-line arguments to UTF8.
>>   (set_revprop): Explicitly state UTF8 encoding when invoking the
>>    svn_subst_translate_string2() since the arguments should already be in
>> this
>>    encoding.  This ensures that no double conversion would be done.
>>   (subcommand_lslocks): Use svn_opt_args_to_target_array4() instead of
>>    svn_opt_args_to_target_array3() to retrieve targets which does not
>> convert
>>    encodings.
>>   (sub_main): Use svn_cmdline__get_utf8_argv() instead of
>>   svn_cmdline__get_cstring_argv() to get UTF8 argv.
>>   (sub_main): Read opt_arg directly to utf8_opt_arg and remove all
>> conversion
>>    from cstring to UTF8.
>>   (sub_main): Do not utf8-ize first_arg.
>>
>> * subversion/svnbench/svnbench.c
>>   (sub_main): Convert args straight to UTF8, remove in-place conversions,
>> and
>>    silently migrate to handle the other arguments as UTF8.
>>
>> * subversion/svnbench/null-list-cmd.c
>>   (includes): Remove svn_utf.h to make sure no more UTF8 conversions are
>> done.
>>
>> * subversion/svnbench/util.c
>>   (svn_cl__args_to_target_array_print_reserved): Use newer version of
>>    svn_client_args_to_target_array(), which doesn't perform UTF8
>> conversion
>>
>>   (sub_main): Directly convert argv to UTF8 and remove all following
>>    conversions.
>>
>> * subversion/svnlook/svnlook.c
>>   (print_diff_tree): Don't convert diff_options to UTF8 twice.
>>   (sub_main): Convert args straight to UTF8, remove in-place conversions,
>> and
>>    handle the other arguments as UTF8.
>>
>> * subversion/svnfsfs/svnfsfs.c
>> * subversion/svnmucc/svnmucc.c
>> * subversion/svndumpfilter/svndumpfilter.c
>> * subversion/svnrdump/svnrdump.c
>> * subversion/svnserve/svnserve.c
>> * subversion/svnversion/svnversion.c
>> * tools/client-side/svn-mergeinfo-normalizer/svn-mergeinfo-normalizer.c
>> * tools/client-side/svnconflict/svnconflict.c
>> * tools/dev/svnraisetreeconflict/svnraisetreeconflict.c
>> * tools/dev/wc-ng/svn-wc-db-tester.c
>> * tools/server-side/svnauthz.c
>>   (includes): Remove svn_utf8.h where possible as we no longer use its
>>    functions.
>>   (sub_main): Convert all args straight to UTF8 and remove later
>> conversion.
>>
>> * subversion/svnsync/svnsync.c
>>   (initialize_cmd, synchronize_cmd, copy_revprops_cmd, info_cmd): Use
>>    svn_opt_args_to_target_array4() instead of
>> svn_opt_args_to_target_array3()
>>    and remove svn_utf_cstring_to_utf8() calls.
>>
>>   (sub_main): Convert args straight to UTF8, remove in-place conversions,
>> and
>>    handle the other arguments as UTF8.
>>
>> * subversion/tests/cmdline/svntest/testcase.py
>>   (_detect_utf8_locale): New local helper.
>>   (RequireUtf8_deco): Add RequireUtf8 decorator that allows test to be
>> run in
>>    UTF8 environment to fix failures on Unix platforms.
>>
>> * subversion/tests/cmdline/basic_tests.py
>>   (): Import RequireUtf8 decorator from testcase.py
>>   (argv_with_best_fit_chars):
>>    Adapt the unit test to the arguments in utf-8 bytes.
>>   (unicode_arguments_test): Use that RequireUtf8 decorator to force the
>> test to
>>    run in the UTF8 environment.
>>
>>
> I am +1 to merge utf8-cmdline branch to trunk.
>
>
>
Thanks for the review!

Completed the merge in r1935602.

> > * subversion/tests/libsvn_subr/opt-test.c
> >  (includes): Add svn_hash.h.
> >  (test_svn_opt_parse_revprop): Add new test.
>
>
> Probably it would be better to #undef  UNICODE_TEST_STRING or just inline
it.

I decided not to change this part yet. But probably yeah but it doesn't
really matter and we've done either way before.

-- 
Timofei Zhakov

Reply via email to