On 4/9/21 5:55 AM, Mark Dale via Mailman-Users wrote: > > In the archive's downloaded .txt (and also .gz) file, the non-ascii > characters are missing and displayed as "?". ... > Any advice on getting the non-ascii characters written into the archive .txt > file would be gratefully received.
The message is prepared for the .txt file by the Article.as_text() method in HyperArch.py <https://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Archiver/HyperArch.py#L563>. In order to do the email address obfuscation in the message body, whether or not ARCHIVER_OBSCURES_EMAILADDRS is True, the method first converts the body to unicode using the charset of the list's language and then after possible obfuscation, converts it back, again using the charset of the list's language. Both these conversions use `errors=replace` which replaces any characters not in the charset with, in the case of ascii, `?`. One way to avoid this replacement would be to change the charset for English from ascii to utf-8. See <https://wiki.list.org/x/15958250>. This isn't a complete solution in the case where the non-ascii characters are encoded other than `utf-8`, e.g., `iso-8859-1`, in the original message, but will probably handle most cases -- Mark Sapiro <m...@msapiro.net> The highway is for gamblers, San Francisco Bay Area, California better use your sense - B. Dylan ------------------------------------------------------ Mailman-Users mailing list -- mailman-users@python.org To unsubscribe send an email to mailman-users-le...@python.org https://mail.python.org/mailman3/lists/mailman-users.python.org/ Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/ https://mail.python.org/archives/list/mailman-users@python.org/