Hi! As discussed on irc, if notmuch stores header values in utf8, its safe to decode them to unicode instances here. best, /p
On Mon, Jul 11, 2011 at 08:03:38AM -0700, Carl Worth wrote: > On Mon, 11 Jul 2011 16:04:17 +0200, Sebastian Spaeth <sebast...@sspaeth.de> > wrote: > > The answer is that things are very implicit. notmuch.h speaks of > > strings but never mentions encodings > > Much of this was intentional on my part. > > For example, I intentionally avoided restrictions on what could be > stored as a tag in the database, (other than the terminating character > implied by "string" of course). > > > So, can be document what encoding we are expected to pass in the various > > APIs > > Yes, let's clarify documentation wherever we need to. > > > For some of the stuff we read directly from the files, eg > > arbitrary headers, we can probably be least sure > > The headers should be decoded to utf-8, (via > g_mime_utils_header_decode_text), before being stored in the database. > > > but are e.g. the returned tags always utf-8? > > No. The tag data is returned exactly as the user presented it. > > > I would love to make the python bindings use unicode() instances in > > cases where we can be sure to actually receive utf-8 encoded strings. > > > > Encodings make my brain hurt. Unfortunately one cannot simply ignore > > them. > > I think a lot of the pain here is due to some bad design decisions in > python itself. Of course, my saying that doesn't make things any easier > for you. > > But do tell me what more we can do to clarify behavior or documentation. > > -Carl > > -- > carl.d.wo...@intel.com > _______________________________________________ > notmuch mailing list > notmuch@notmuchmail.org > http://notmuchmail.org/mailman/listinfo/notmuch
From 988a9832d714dfa0f91b2b1185a50acb4a6ca4b5 Mon Sep 17 00:00:00 2001 From: pazz <patricktot...@gmail.com> Date: Tue, 12 Jul 2011 19:47:39 +0100 Subject: [PATCH 1/8] unicode return value for Message.get_header() As discussed in IRC, notmuch recodes mailheaders to utf-8, so we can safely decode them into unicode instances. --- bindings/python/notmuch/message.py | 8 +++++--- 1 files changed, 5 insertions(+), 3 deletions(-) diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py index 763d2c6..4a43a88 100644 --- a/bindings/python/notmuch/message.py +++ b/bindings/python/notmuch/message.py @@ -379,14 +379,16 @@ class Message(object): :param header: The name of the header to be retrieved. It is not case-sensitive (TODO: confirm). - :type header: str - :returns: The header value as string + :type header: str or unicode instance + :returns: The header value as a unicode string :exception: :exc:`NotmuchError` * STATUS.NOT_INITIALIZED if the message is not initialized. * STATUS.NULL_POINTER, if no header was found """ + if isinstance(header, unicode): + header = header.encode('utf-8') if self._msg is None: raise NotmuchError(STATUS.NOT_INITIALIZED) @@ -394,7 +396,7 @@ class Message(object): header = Message._get_header (self._msg, header) if header == None: raise NotmuchError(STATUS.NULL_POINTER) - return header + return header.decode('utf-8') def get_filename(self): """Returns the file path of the message file -- 1.7.4.1
signature.asc
Description: Digital signature
_______________________________________________ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch