ooops, I meant to cc: dev at subversion. Thanks, Daniel. I'm more used to dev at httpd where there is some magic involving replyto:
Here is what I see if I run dbx against svnversion with a stop set in svn_cmdline_fputs: [2] stopped in svn_cmdline_fputs at line 288 in file "cmdline.c" ($t1) 288 svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool) (dbx64) where svn_cmdline_fputs(string = "svn: Valid UTF-8 data.(hex: 2e 61 4b).followed by invalid UTF-8 sequence.(hex: a2 a5 95 61).", stream = 0xD6C84F8, pool = 0xE4E4040), line 288 in "cmdline.c" svn_cmdline_fprintf(stream = 0xD6C84F8, pool = 0xE4E4040, fmt = "%s%s.", ...), line 284 in "cmdline.c" print_error.$b20, line 332 in "error.c" print_error(err = 0xE4E4078, stream = 0xD6C84F8, prefix = "svn: "), line 332 in "error.c" svn_handle_error2.$b25.$b29, line 405 in "error.c" svn_handle_error2.$b25, line 405 in "error.c" svn_handle_error2(err = 0xE4E4078, stream = 0xD6C84F8, fatal = 0, prefix = "svn: "), line 405 in "error.c" main.$b17.$b18, line 216 in "main.c" main.$b17, line 216 in "main.c" main(argc = 1, argv = 0xDF76878), line 216 in "main.c" Any strings that are printable with z/OS dbx "where" or "p" are in the native EBCDIC encoding. If I stop in svn_handle_error2 and print the err structure, I see: (dbx64) p *err (apr_err = 121, message = "Valid UTF-8 data.(hex: 2e 61 4b).followed by invalid UTF-8 sequence.(hex: a2 a5 95 61)", child = 0x0, pool = 0xE4E4040, file = "./subversion/libsvn_subr/utf.c", line = 632) ...so we have native strings here which never become UTF-8. I could patch print_error() to do the UTF-8 conversion prior to calling svn_cmdline_fputs(), but the back-to-back conversions seem silly. Maybe it would be better to define svn_cmdline_fputs_native_cstring() or some such and call that from print_error() and any other caller that passes native strings. Greg On Wed, May 12, 2010 at 10:42 AM, Greg Ames <ames.g...@gmail.com> wrote: > > > On Wed, May 12, 2010 at 12:59 AM, Daniel Shahaf > <d...@daniel.shahaf.name>wrote: > >> Greg Ames wrote on Tue, 11 May 2010 at 19:36 -0400: >> > The error messages are in the native code page to start with, so running >> > them through a UTF-8 -> native conversion doesn't do anything helpful. >> > >> ... >> > Index: subversion/libsvn_subr/cmdline.c >> > =================================================================== >> > --- subversion/libsvn_subr/cmdline.c (revision 943316) >> > +++ subversion/libsvn_subr/cmdline.c (working copy) >> > @@ -318,24 +318,15 @@ >> > svn_error_t * >> > svn_cmdline_fputs(const char *string, FILE* stream, apr_pool_t *pool) >> > { >> > - svn_error_t *err; >> > - const char *out; >> > + /* "string" is native. do not try to convert from UTF-8 */ >> >> The doc string of this function (see subversion/include/svn_cmdline.h) >> specifically promises that it'll do conversion from UTF-8. > > > ok, but > > a) that's not appropriate for error messages > b) it's not enforced. > > >> We cannot make it unconditionally do the opposite. > > > I have done exactly that with good results > > >> (Perhaps with suitable #ifdef's we could do it; or perhaps your problem >> can be fixed elsewhere (e.g., the error-printing code).) >> >> > The SVN_ERR() macro and supporting functions produce native strings, not > UTF-8, and they are widely used. > > >> Is your issue only with the encoding of error messages? > > > This patch addresses only the encoding of error messages. There are a few > other places where there is confusion about the encoding of input or > literals. > > >> Or with the the encoding of all svn output? >> > > I think it's a great idea to have svn metadata and text files in the > repository in UTF-8 to promote universal access. But error messages are > local and shouldn't be munged much or Bad Things can happen. Yes, someone > could inject code after SVN_ERR() to convert all the literal strings and > characters in error messages throughout subversion to UTF-8. But what's the > point of doing that then converting it back to native to write to stderr? > And what are the odds of picking up 100% of the literal strings and > characters and doing exactly one UTF-8 conversion on all of them prior to > calling svn_cmdline_fputs()? Simplicity is good, especially in error > situations, and it saves a few cycles on non-UTF-8 systems. > > Greg >