[
https://issues.apache.org/jira/browse/SVN-807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Foad updated SVN-807:
----------------------------
Component/s: (was: src)
> gracefully degrade from failed charset conversion
> -------------------------------------------------
>
> Key: SVN-807
> URL: https://issues.apache.org/jira/browse/SVN-807
> Project: Subversion
> Issue Type: Bug
> Affects Versions: all
> Reporter: Karl Fogel
> Priority: Minor
> Fix For: unscheduled
>
> Attachments: 1_brane-utf-8.mbox, 2_ulrich.mbox
>
>
> {noformat:nopanel=true}
> Right now, if a log message contains characters that cannot be
> represented in the client's locale, that log message will simply show
> up as:
> "[unconvertible log msg]"
> Graceful degradation would be nice here :-).
> See the dev list thread "Re: converting unconvertible UTF-8 data" for
> discussion of possible solutions.
> My first idea was to write a fuzzy converter function that replaces
> every unconverted byte with an escape sequence representing its
> numerical code ("?\XXX" or somesuch).
> Then Ulrich Drepper pointed out that since this data is mainly for
> human consumption, the "//TRANSLIT" behavior of glibc's iconv and GNU
> libiconv would produce more readable output. We can at least detect
> when we're using one of those iconv's and append that option to the
> to-charset string where appropriate. (Marcus Comstedt points out that
> some iconv implementations automatically do transliteration for you,
> and don't even tell you whether or not it's happened, which is sort of
> unnerving.)
> However, if you are on a system that doesn't support this, you'll get
> the result above.
> So there are various non-mutually-exclusive steps to take here:
> - Write the fuzzy function with the escape codes, use where
> translit not available.
> - Meanhwile, get Subversion doing transliteration where possible
> (Ulrich may do)
> - Possible early fix: make "svn log" accept --force or
> --message-encoding, so one
> can make it output the raw bytes or a specific encoding,
> respectively.
> {noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)