[ https://issues.apache.org/jira/browse/SVN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972982#comment-16972982 ]
Nathan Hartman commented on SVN-807: ------------------------------------ Daniel Shahaf tested this issue as follows: {quote}{color:#172b4d}{{Well, I'm sure there are better ways, but I just did this:}} {{.}} {{ % svnadmin create r}} {{ % vim -b r/db/revprops/0/0}} {{.}} {{and manually added an svn:log property with a value that's invalid UTF-8 [svn:* properties must use UTF-8 with LF line endings]:}} {{.}} {{ % xxd r/db/revprops/0/0 | vipe}} {{ 00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032 K 8.svn:date.V 2}} {{ 00000010: 370a 3230 3139 2d31 312d 3131 5431 363a 7.2019-11-11T16:}} {{ 00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20 08:01.344744Z.K }} {{ 00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff 7.svn:log.V 3...}} {{ ^^^^ }} {{ 00000040: ff0a 454e 440a ..END.}} {{ ^^ }} {{ %}}{{You can confirm it's invalid:}} {{.}} {{ % iconv -f utf8 < r/db/revprops/0/0 > /dev/null}} {{ iconv: illegal input sequence at position 62}} {{ zsh: exit 1 iconv -f utf8 < r/db/revprops/0/0 > /dev/null}}{{'svn log' gives:}} {{.}} {{ % svn log file://$PWD/r }} {{ ------------------------------------------------------------------------}} {{ r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line}} {{ ?\FF?\FF?\FF}} {{ ------------------------------------------------------------------------}} {{ %}}{{So I think we can close it as "Fixed at some point"?}}{color}{quote} See the dev@ mailing list thread ["Issue tracker cleanup: SVN-807"|https://mail-archives.apache.org/mod_mbox/subversion-dev/201911.mbox/%3c20191111163006.ewawdd6af5fnvqo7@tarpaulin.shahaf.local2%3e] (11 Nov 2019). This issue was fixed in r842879 with the addition of *svn_utf_cstring_from_utf8_fuzzy()*. With this change, unconvertible UTF-8 is displayed in the form of hexadecimal codes as shown above. This was done before Subversion 1.0.0. Nothing more can be done if the log contains invalid UTF-8 because such codes cannot be converted to anything meaningful. It is unlikely that this would happen under normal circumstances because Subversion checks log messages for bad UTF-8 (and also mismatched line endings) at commit time and aborts the commit if such content is found. >From my reading it appears that this issue was left open because of a desire >to move svn_utf_cstring_from_utf8_fuzzy() to Apache Portable Runtime (APR). If >there is still interest in doing that, it should be tracked in a separate >issue. We are closing this issue as it has been fixed. > gracefully degrade from failed charset conversion > ------------------------------------------------- > > Key: SVN-807 > URL: https://issues.apache.org/jira/browse/SVN-807 > Project: Subversion > Issue Type: Bug > Affects Versions: all > Reporter: Karl Fogel > Priority: Minor > Fix For: unscheduled > > Attachments: 1_brane-utf-8.mbox, 2_ulrich.mbox > > > {noformat:nopanel=true} > Right now, if a log message contains characters that cannot be > represented in the client's locale, that log message will simply show > up as: > "[unconvertible log msg]" > Graceful degradation would be nice here :-). > See the dev list thread "Re: converting unconvertible UTF-8 data" for > discussion of possible solutions. > My first idea was to write a fuzzy converter function that replaces > every unconverted byte with an escape sequence representing its > numerical code ("?\XXX" or somesuch). > Then Ulrich Drepper pointed out that since this data is mainly for > human consumption, the "//TRANSLIT" behavior of glibc's iconv and GNU > libiconv would produce more readable output. We can at least detect > when we're using one of those iconv's and append that option to the > to-charset string where appropriate. (Marcus Comstedt points out that > some iconv implementations automatically do transliteration for you, > and don't even tell you whether or not it's happened, which is sort of > unnerving.) > However, if you are on a system that doesn't support this, you'll get > the result above. > So there are various non-mutually-exclusive steps to take here: > - Write the fuzzy function with the escape codes, use where > translit not available. > - Meanhwile, get Subversion doing transliteration where possible > (Ulrich may do) > - Possible early fix: make "svn log" accept --force or > --message-encoding, so one > can make it output the raw bytes or a specific encoding, > respectively. > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005)