Hi Sorry this email ended up rather long:
Summary: I have run a test (see below) on all of the lkml part of the performance-corpus, and all the changes look expected. So this series looks good to me. First note how we do the bodypart-insertion: for a mime type of text/plain we first try the text/plain handler, then a text/* handler, and finally a */* handler until one succeeds. Before this series, when the part is application/octet-stream but is detected as text/plain, text/plain handler fails with a "bodypart insertion error" because notmuch-get-bodypart-text fails can't get the text (because it's not officially text). Thus we fall back on the */* handler and that inserts the part. With this series notmuch-get-bodypart-text succeeds and we stop. Thus in most cases the only change is that we don't get a "bodypart insertion error", but all the text looks the same. In a couple of cases the text/plain handler wraps lines/replaces ^M by unix newlines, whereas as the */* handler does not. This is an improvement. There is one more "difference" but I think this is actually something random. Sometimes when the part is application/tar or application/zip I get "Bodypart insert error: Symbol's function definition is void: gnus-recursive-directory-files". If I load gnus this goes away. In my first batch of tests this only occurred when using this series, but since then I have reproduced it on mainline. I think something else I did when setting up the test on mainline caused gnus to be loaded, but i have not worked out what is going on there. Finally, the test was as follows. I downloaded the performance corpus, configured a separate notmuch config file to use the performance-test/corpus/mail/lkml as the mailstore, went into notmuch-emacs and to the inbox (which contained all messages) and ran the following lisp function (defun my-save-all-show () (interactive) (goto-char (point-min)) (let ((count 0)) (while (notmuch-search-find-thread-id) (let ((thread-id (notmuch-search-find-thread-id))) (setq count (1+ count)) (message "Thread %s: %s" count thread-id) (notmuch-show thread-id) (let ((text (buffer-string)) (coding-system-for-write 'no-conversion)) (with-temp-file (concat "OUTPUT-" thread-id) (insert text))) (kill-buffer)) (notmuch-search-next-thread)))) I moved the OUTPUT files elsewhere and repeated with this series applied and then ran diff on the output. This gave 7 threads with a change (each an individual message) from the 16000 threads/ 100000 messages which I looked at individually as above. Best wishes Mark On Mon, 14 Mar 2016, David Bremner <da...@tethera.net> wrote: > David Edmondson <d...@dme.org> writes: > >> On Sun, Mar 13 2016, Mark Walters wrote: >>> However, it would be sensible to get testing in a greater variety of >>> charsets/encodings >> >> Agreed. Does anyone have suggestions on how we might achieve this? A >> corpus of mail that we could use? > > Maybe the notmuch performance corpus, particularly the lkml sample. > > grep -R charset= performance-test/corpus/mail/lkml | sed -e 's/^.*charset=//' > -e 's/;.*//' -e 's/"//g' | tr '[A-Z]' '[a-z]' | sort -u > > gives > > euc-kr > gb2312 > iso-2022-jp > iso-2022-jp-2 > iso-8859-1 > iso-8859-14 > iso 8859-15 > iso-8859-15 > iso-8859-1 > iso-8859-2 > iso-8859-6 > iso-8859-7 > iso-8859-9 > koi8-r > koi8-u > ks_c_5601-1987 > shift_jis > unknown > unknown-8bit > us-ascii > utf8 > utf-8 > windows-1250 > windows-1251 > windows-1252 > windows-1255 > > > to unpack the corpus > > cd performance-test > make download-corpus > ./T00-new.sh --large > > probably interrupt the test once notmuch-new starts running. _______________________________________________ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch