Hi!

As discussed on irc, if notmuch stores header values in utf8,
its safe to decode them to unicode instances here.
best,
/p


On Mon, Jul 11, 2011 at 08:03:38AM -0700, Carl Worth wrote:
> On Mon, 11 Jul 2011 16:04:17 +0200, Sebastian Spaeth <sebast...@sspaeth.de> 
> wrote:
> > The answer is that things are very implicit. notmuch.h speaks of
> > strings but never mentions encodings
> 
> Much of this was intentional on my part.
> 
> For example, I intentionally avoided restrictions on what could be
> stored as a tag in the database, (other than the terminating character
> implied by "string" of course).
> 
> > So, can be document what encoding we are expected to pass in the various
> > APIs
> 
> Yes, let's clarify documentation wherever we need to.
> 
> > For some of the stuff we read directly from the files, eg
> > arbitrary headers, we can probably be least sure
> 
> The headers should be decoded to utf-8, (via
> g_mime_utils_header_decode_text), before being stored in the database.
> 
> > but are e.g. the returned tags always utf-8?
> 
> No. The tag data is returned exactly as the user presented it.
> 
> > I would love to make the python bindings use unicode() instances in
> > cases where we can be sure to actually receive utf-8 encoded strings.
> > 
> > Encodings make my brain hurt. Unfortunately one cannot simply ignore
> > them.
> 
> I think a lot of the pain here is due to some bad design decisions in
> python itself. Of course, my saying that doesn't make things any easier
> for you.
> 
> But do tell me what more we can do to clarify behavior or documentation.
> 
> -Carl
> 
> -- 
> carl.d.wo...@intel.com



> _______________________________________________
> notmuch mailing list
> notmuch@notmuchmail.org
> http://notmuchmail.org/mailman/listinfo/notmuch

From 988a9832d714dfa0f91b2b1185a50acb4a6ca4b5 Mon Sep 17 00:00:00 2001
From: pazz <patricktot...@gmail.com>
Date: Tue, 12 Jul 2011 19:47:39 +0100
Subject: [PATCH 1/8] unicode return value for Message.get_header()

As discussed in IRC, notmuch recodes mailheaders to
utf-8, so we can safely decode them into unicode instances.
---
 bindings/python/notmuch/message.py |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/bindings/python/notmuch/message.py b/bindings/python/notmuch/message.py
index 763d2c6..4a43a88 100644
--- a/bindings/python/notmuch/message.py
+++ b/bindings/python/notmuch/message.py
@@ -379,14 +379,16 @@ class Message(object):
 
         :param header: The name of the header to be retrieved.
                        It is not case-sensitive (TODO: confirm).
-        :type header: str
-        :returns: The header value as string
+        :type header: str or unicode instance
+        :returns: The header value as a unicode string
         :exception: :exc:`NotmuchError`
 
                     * STATUS.NOT_INITIALIZED if the message 
                       is not initialized.
                     * STATUS.NULL_POINTER, if no header was found
         """
+        if isinstance(header, unicode):
+            header = header.encode('utf-8')
         if self._msg is None:
             raise NotmuchError(STATUS.NOT_INITIALIZED)
 
@@ -394,7 +396,7 @@ class Message(object):
         header = Message._get_header (self._msg, header)
         if header == None:
             raise NotmuchError(STATUS.NULL_POINTER)
-        return header
+        return header.decode('utf-8')
 
     def get_filename(self):
         """Returns the file path of the message file
-- 
1.7.4.1

Attachment: signature.asc
Description: Digital signature

_______________________________________________
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch

Reply via email to