I just spent a very non-pleasant hour looking into why svn.apache.org's
sync fails with UTF-8 errors.  Turns out that apr_xlate_open() was
returning APR_ENOTIMPL and libsvn_subr was kindly ignoring that error
without even a log message.

The error handling logic in question was added in r15379 <->
http://svn.apache.org/r855453.  The log message of that revision
indicates the intent was to fix a memory leak; whether excluding EINVAL
and ENOTIMPL from the error handling block that follows was intentional
is unclear.

So, any objections to the following?

[[[
Follow-up to r15379(r855453): don't discard apr_xlate_open() failures.

The incumbent code just leads to "Unable to translate to/from UTF-8"
errors later on, which renders svnsync unable to sync log messages
containing non-ASCII (even though they can be committed) and fails our
test suite in three places:

    FAIL:  prop_tests.py 22: test prop* handle invalid property names
    FAIL:  svnsync_tests.py 28: copy and reencode non-UTF-8 svn:* props
    FAIL:  svnsync_tests.py 29: copy UTF-8 svn:* props identically

* subversion/libsvn_subr/utf.c
  (xlate_alloc_handle): Have EINVAL and ENOTIMPL return values from
    apr_xlate_open() fall through to the normal error handling.

    Rejigger that handling so that apr_strerror(EINVAL) or
    apr_strerror(ENOTIMPL) are included in the error chain.
    (The incumbent code would report error code E70023, which is
    APR_ENOTIMPL, but mere mortals aren't supposed to know that.)
]]]

[[[
Index: subversion/libsvn_subr/utf.c
===================================================================
--- subversion/libsvn_subr/utf.c        (revision 1501080)
+++ subversion/libsvn_subr/utf.c        (working copy)
@@ -230,9 +230,12 @@ xlate_alloc_handle(xlate_handle_node_t **ret,
 
   if (APR_STATUS_IS_EINVAL(apr_err) || APR_STATUS_IS_ENOTIMPL(apr_err))
     handle = NULL;
-  else if (apr_err != APR_SUCCESS)
+  if (apr_err != APR_SUCCESS)
     {
       const char *errstr;
+      svn_error_t *err;
+      char apr_strerr[512];
+
       /* Can't use svn_error_wrap_apr here because it calls functions in
          this file, leading to infinite recursion. */
       if (frompage == SVN_APR_LOCALE_CHARSET)
@@ -248,7 +251,13 @@ xlate_alloc_handle(xlate_handle_node_t **ret,
                               _("Can't create a character converter from "
                                 "'%s' to '%s'"), frompage, topage);
 
-      return svn_error_create(apr_err, NULL, errstr);
+      /* Just put the error on the stack, since svn_error_create duplicates it
+         later.  APR_STRERR will be in the local encoding, not in UTF-8, 
though.
+       */
+      svn_strerror(apr_err, apr_strerr, sizeof(apr_strerr));
+      return svn_error_create(apr_err, 
+                              svn_error_create(apr_err, NULL, apr_strerr),
+                              errstr);
     }
 
   /* Allocate and initialize the node. */
]]]

The error upon ENOTIMPL looks like this:

    svn: E070023: Can't create a character converter from native encoding to 
'UTF-8'
    svn: E070023: This function has not been implemented on this platform

It appears as soon as svn_utf_cstring_to_utf8() is called --- which is
normally during argv parsing.  The error happens even if the argv
arguments are all ASCII, which effectively adds a new dependency on
apr_xlate_* even for people who use only ASCII.  I assume this new
failure mode for ASCII-only users means we cannot backport this change.

---

Thoughts?

Reply via email to