On Fri, 11 Jul 2014, David Bremner <david at tethera.net> wrote:
> Austin Clements <amdragon at MIT.EDU> writes:
>> +This returns the content of the given part as a multibyte Lisp
> What does "multibyte" mean here? utf8? current encoding?

Elisp has two kinds of stings: "unibyte strings" and "multibyte


You can think of unibyte strings as binary data; they're just vectors of
bytes without any particular encoding semantics (though when you use a
unibyte string you can endow it with encoding).  Multibyte strings,
however, are text; they're vectors of Unicode code points.

>> +string after performing content transfer decoding and any
>> +necessary charset decoding.  It is an error to use this for
>> +non-text/* parts."
>> +  (let ((content (plist-get part :content)))
>> +    (when (not content)
>> +      ;; Use show --format=sexp to fetch decoded content
>> +      (let* ((args `("show" "--format=sexp" "--include-html"
>> +                 ,(format "--part=%s" (plist-get part :id))
>> +                 ,@(when process-crypto '("--decrypt"))
>> +                 ,(notmuch-id-to-query (plist-get msg :id))))
>> +         (npart (apply #'notmuch-call-notmuch-sexp args)))
>> +    (setq content (plist-get npart :content))
>> +    (when (not content)
>> +      (error "Internal error: No :content from %S" args))))
>> +    content))
> I'm a bit curious at the lack of setting "coding-system-for-read" here.
> Are we assuming the user has their environment set up correctly? Not so
> much a criticism as being nervous about everything coding-system
> related.

That is interesting.  coding-system-for-read should really go in
notmuch-call-notmuch-sexp, but I worry that, while *almost* all strings
the CLI outputs are UTF-8, not quite all of them are.  For example, we
output filenames exactly at the OS reports the bytes to us (which is
necessary, in a sense, because POSIX enforces no particular encoding on
file names, but still really unfortunate).

We could set coding-system-for-read, but a full solution needs more
cooperation from the CLI.  Possibly the right answer, at least for the
sexp format, is to do our own UTF-8 to "\uXXXX" escapes for strings that
are known to be UTF-8 and leave the raw bytes for the few that aren't.
Then we would set the coding-system-for-read to 'no-conversion and I
think everything would Just Work.

That doesn't help for JSON, which is supposed to be all UTF-8 all the
time.  I can think of solutions there, but they're all ugly and involve
things like encoding filenames as base64 when they aren't valid UTF-8.

So...  I don't think I'm going to do anything about this at this moment.

> I didn't see anything else to object to in this patch or the previous
> one.

Reply via email to