Joseph Myers <[email protected]>:
> Concretely, what I'd suggest is: convert ISO-8859-1 entries in the
> checked-in list to UTF-8, removing anything that thereby becomes a
> duplicate or unnecessary; handle anything whose encoding isn't simply
> ISO-8859-1 or UTF-8 via a hardcoded entry in bugdb.py using hex escapes
> like the existing such entries there. Once the checked-in list is pure
> UTF-8 it's easier for people to review and edit. Where the issue is only
> presence of ISO-8859 NBSP, or "" or () around the names, remove that in
> the checked-in list and again remove duplicates. That way the list can be
> limited to non-encoding variations.
Be aware that repusurgeon has a "transcode" command for moving
a specified set of object to UTF-8 from a specified encoding.
--
<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>