Control: tags -1 pending On 2014-02-12 08:56:03, Felix Dreissig wrote: > Package: monkeysign > Version: 2.x > Severity: normal > > I wanted to build the manpage only for Monkeysign’s CLI version, so I removed > `monkeyscan:monkeysign.gtkui:MonkeysignScanUi.parser` from ‘setup.cfg' and > ran `setup.py build_manpage`. > That failed with: > >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 55: >> ordinal not in range(128) > > An encoding problem didn’t make any sense to me, so I tried to track the > issue down. Turns out it doesn’t occur when PyGTK is imported into the build > process, either directly through 'gtkui.py' or via 'msg_exception.py'. > The explanation for this behaviour is that PyGTK sets Python’s default > encoding to UTF-8. This is GNOME bug 132040 from back in 2004: > https://bugzilla.gnome.org/show_bug.cgi?id=132040 > > So what exactly causes the above error? > It is the accent in your surname, anarcat, that causes manpage writing to > fail with ASCII encoding ;-). The best way to fix this would in my opinion be > using an unicode string for `author` in 'setup.py', but Disutils seem not to > respect that.
damn french. ;)
i agree that author should be unicode, no idea while distutils is
dropping that to the floor. oh well.
> I used the following patch, which works:
>
>> --- a/monkeysign/documentation.py
>> +++ b/monkeysign/documentation.py
>> @@ -84,7 +84,7 @@ class build_manpage(Command):
>> def _write_footer(self, parser):
>> ret = []
>> appname = self.distribution.get_name()
>> - author = '%s <%s>' % (self.distribution.get_author(),
>> + author = '%s <%s>' %
>> (self.distribution.get_author().decode('utf-8'),
>> self.distribution.get_author_email())
>> ret.append(('.SH AUTHORS\n.B %s\nwas written by %s.\n'
>> % (self._markup(appname), self._markup(author))))
>> @@ -109,7 +109,7 @@ class build_manpage(Command):
>> path = os.path.join(self.output, parser.prog + '.1')
>> self.announce('writing man page to %s' % path, 2)
>> stream = open(path, 'w')
>> - stream.write(''.join(manpage))
>> + stream.write(''.join(manpage).encode('utf-8'))
>> stream.close()
I used a slight variation, i decode in the ret.append() call so that the
email can also contain accents, which may be illegal, but I don't care:
i'm not going to go enforcing standards here, i want to avoid crashes at
build time. :)
> It might, however, not be the most comprehensive way to deal with the issue:
> The whole process of generating manpages uses a mixture of ordinary and
> unicode strings and might need some review with respect to encoding issues.
true. this was messy in the first place, although I am not sure i want
to pursue this much further. :P
thanks for all the patches and help!
a.
--
That's the kind of society I want to build. I want a guarantee - with
physics and mathematics, not with laws - that we can give ourselves
real privacy of personal communications.
- John Gilmore
pgpyhjfp6iEYW.pgp
Description: PGP signature

