I just spent a very non-pleasant hour looking into why svn.apache.org's sync fails with UTF-8 errors. Turns out that apr_xlate_open() was returning APR_ENOTIMPL and libsvn_subr was kindly ignoring that error without even a log message.
The error handling logic in question was added in r15379 <-> http://svn.apache.org/r855453. The log message of that revision indicates the intent was to fix a memory leak; whether excluding EINVAL and ENOTIMPL from the error handling block that follows was intentional is unclear. So, any objections to the following? [[[ Follow-up to r15379(r855453): don't discard apr_xlate_open() failures. The incumbent code just leads to "Unable to translate to/from UTF-8" errors later on, which renders svnsync unable to sync log messages containing non-ASCII (even though they can be committed) and fails our test suite in three places: FAIL: prop_tests.py 22: test prop* handle invalid property names FAIL: svnsync_tests.py 28: copy and reencode non-UTF-8 svn:* props FAIL: svnsync_tests.py 29: copy UTF-8 svn:* props identically * subversion/libsvn_subr/utf.c (xlate_alloc_handle): Have EINVAL and ENOTIMPL return values from apr_xlate_open() fall through to the normal error handling. Rejigger that handling so that apr_strerror(EINVAL) or apr_strerror(ENOTIMPL) are included in the error chain. (The incumbent code would report error code E70023, which is APR_ENOTIMPL, but mere mortals aren't supposed to know that.) ]]] [[[ Index: subversion/libsvn_subr/utf.c =================================================================== --- subversion/libsvn_subr/utf.c (revision 1501080) +++ subversion/libsvn_subr/utf.c (working copy) @@ -230,9 +230,12 @@ xlate_alloc_handle(xlate_handle_node_t **ret, if (APR_STATUS_IS_EINVAL(apr_err) || APR_STATUS_IS_ENOTIMPL(apr_err)) handle = NULL; - else if (apr_err != APR_SUCCESS) + if (apr_err != APR_SUCCESS) { const char *errstr; + svn_error_t *err; + char apr_strerr[512]; + /* Can't use svn_error_wrap_apr here because it calls functions in this file, leading to infinite recursion. */ if (frompage == SVN_APR_LOCALE_CHARSET) @@ -248,7 +251,13 @@ xlate_alloc_handle(xlate_handle_node_t **ret, _("Can't create a character converter from " "'%s' to '%s'"), frompage, topage); - return svn_error_create(apr_err, NULL, errstr); + /* Just put the error on the stack, since svn_error_create duplicates it + later. APR_STRERR will be in the local encoding, not in UTF-8, though. + */ + svn_strerror(apr_err, apr_strerr, sizeof(apr_strerr)); + return svn_error_create(apr_err, + svn_error_create(apr_err, NULL, apr_strerr), + errstr); } /* Allocate and initialize the node. */ ]]] The error upon ENOTIMPL looks like this: svn: E070023: Can't create a character converter from native encoding to 'UTF-8' svn: E070023: This function has not been implemented on this platform It appears as soon as svn_utf_cstring_to_utf8() is called --- which is normally during argv parsing. The error happens even if the argv arguments are all ASCII, which effectively adds a new dependency on apr_xlate_* even for people who use only ASCII. I assume this new failure mode for ASCII-only users means we cannot backport this change. --- Thoughts?