On 4/9/21 5:55 AM, Mark Dale via Mailman-Users wrote:
> 
> In the archive's downloaded .txt (and also .gz) file, the non-ascii 
> characters are missing and displayed as "?".
...
> Any advice on getting the non-ascii characters written into the archive .txt 
> file would be gratefully received.


The message is prepared for the .txt file by the Article.as_text()
method in HyperArch.py
<https://bazaar.launchpad.net/~mailman-coders/mailman/2.1/view/head:/Mailman/Archiver/HyperArch.py#L563>.
In order to do the email address obfuscation in the message body,
whether or not ARCHIVER_OBSCURES_EMAILADDRS is True, the method first
converts the body to unicode using the charset of the list's language
and then after possible obfuscation, converts it back, again using the
charset of the list's language. Both these conversions use
`errors=replace` which replaces any characters not in the charset with,
in the case of ascii, `?`.

One way to avoid this replacement would be to change the charset for
English from ascii to utf-8. See <https://wiki.list.org/x/15958250>.

This isn't a complete solution in the case where the non-ascii
characters are encoded other than `utf-8`, e.g., `iso-8859-1`, in the
original message, but will probably handle most cases


-- 
Mark Sapiro <m...@msapiro.net>        The highway is for gamblers,
San Francisco Bay Area, California    better use your sense - B. Dylan
------------------------------------------------------
Mailman-Users mailing list -- mailman-users@python.org
To unsubscribe send an email to mailman-users-le...@python.org
https://mail.python.org/mailman3/lists/mailman-users.python.org/
Mailman FAQ: http://wiki.list.org/x/AgA3
Security Policy: http://wiki.list.org/x/QIA9
Searchable Archives: https://www.mail-archive.com/mailman-users@python.org/
    https://mail.python.org/archives/list/mailman-users@python.org/

Reply via email to