[ 
https://issues.apache.org/jira/browse/SVN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16972985#comment-16972985
 ] 

Nathan Hartman commented on SVN-807:
------------------------------------

Daniel Shahaf tested this issue as follows. See the dev@ mailing list thread 
["Issue tracker cleanup: 
SVN-807"|https://mail-archives.apache.org/mod_mbox/subversion-dev/201911.mbox/%3c20191111163006.ewawdd6af5fnvqo7@tarpaulin.shahaf.local2%3e]
 (11 Nov 2019).

 
{code:java}
Well, I'm sure there are better ways, but I just did this:
.
 % svnadmin create r
 % vim -b r/db/revprops/0/0
.
and manually added an svn:log property with a value that's invalid UTF-8 [svn:*
properties must use UTF-8 with LF line endings]:
.
 % xxd r/db/revprops/0/0 | vipe
 00000000: 4b20 380a 7376 6e3a 6461 7465 0a56 2032 K 8.svn:date.V 2
 00000010: 370a 3230 3139 2d31 312d 3131 5431 363a 7.2019-11-11T16:
 00000020: 3038 3a30 312e 3334 3437 3434 5a0a 4b20 08:01.344744Z.K 
 00000030: 370a 7376 6e3a 6c6f 670a 5620 330a ffff 7.svn:log.V 3...
                                              ^^^^ 
 00000040: ff0a 454e 440a ..END.
           ^^ 
 %
You can confirm it's invalid:
.
 % iconv -f utf8 < r/db/revprops/0/0 > /dev/null
 iconv: illegal input sequence at position 62
 zsh: exit 1     iconv -f utf8 < r/db/revprops/0/0 > /dev/null
'svn log' gives:
.
 % svn log file://$PWD/r 
 ------------------------------------------------------------------------
 r0 | (no author) | 2019-11-11 16:08:01 +0000 (Mon, 11 Nov 2019) | 1 line
 
 ?\FF?\FF?\FF
 ------------------------------------------------------------------------
 %
So I think we can close it as "Fixed at some point"?
{code}
 


This issue was fixed in r842879 with the addition of 
svn_utf_cstring_from_utf8_fuzzy(). With this change, unconvertible UTF-8 is 
displayed in the form of hexadecimal codes as shown above. This was done before 
Subversion 1.0.0.

Nothing more can be done if the log contains invalid UTF-8 because such codes 
cannot be converted to anything meaningful. It is unlikely that this would 
happen under normal circumstances because Subversion checks log messages for 
bad UTF-8 (and also mismatched line endings) at commit time and aborts the 
commit if such content is found.

>From my reading it appears that this issue was left open because of a desire 
>to move svn_utf_cstring_from_utf8_fuzzy() to Apache Portable Runtime (APR). If 
>there is still interest in doing that, it should be tracked in a separate 
>issue.

We are closing this issue as it has been fixed.

> gracefully degrade from failed charset conversion
> -------------------------------------------------
>
>                 Key: SVN-807
>                 URL: https://issues.apache.org/jira/browse/SVN-807
>             Project: Subversion
>          Issue Type: Bug
>    Affects Versions: all
>            Reporter: Karl Fogel
>            Priority: Minor
>             Fix For: unscheduled
>
>         Attachments: 1_brane-utf-8.mbox, 2_ulrich.mbox
>
>
> {noformat:nopanel=true}
> Right now, if a log message contains characters that cannot be
> represented in the client's locale, that log message will simply show
> up as:
>    "[unconvertible log msg]"
> Graceful degradation would be nice here :-).
> See the dev list thread "Re: converting unconvertible UTF-8 data" for
> discussion of possible solutions.
> My first idea was to write a fuzzy converter function that replaces
> every unconverted byte with an escape sequence representing its
> numerical code ("?\XXX" or somesuch).
> Then Ulrich Drepper pointed out that since this data is mainly for
> human consumption, the "//TRANSLIT" behavior of glibc's iconv and GNU
> libiconv would produce more readable output.  We can at least detect
> when we're using one of those iconv's and append that option to the
> to-charset string where appropriate.  (Marcus Comstedt points out that
> some iconv implementations automatically do transliteration for you,
> and don't even tell you whether or not it's happened, which is sort of
> unnerving.)
> However, if you are on a system that doesn't support this, you'll get
> the result above.
> So there are various non-mutually-exclusive steps to take here:
>    - Write the fuzzy function with the escape codes, use where
> translit not available.
>    - Meanhwile, get Subversion doing transliteration where possible
> (Ulrich may do)
>    - Possible early fix: make "svn log" accept --force or
> --message-encoding, so one
>       can make it output the raw bytes or a specific encoding,
> respectively.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to