[Actually send the reply to the list as well.] On Tue, September 1, 2015 12:15, Stephen J. Turnbull wrote: > David Magda writes: > > > When I run 'bin/arch mylistname' I get the following: > > > > [...] > > figuring article archives > > 2005-October > > /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:176: > > UnicodeWarning: Unicode equal comparison failed to convert both arguments > > to Unicode - interpreting them as being unequal > > self.dict = marshal.load(fp) > > /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:74: > > UnicodeWarning: Unicode equal comparison failed to convert both arguments > > to Unicode - interpreting them as being unequal > > self.sorted.sort() > > Updating index files for archive [2004-December] > > [...] > > Updating HTML for article 214 > > Pickling archive state into > > /usr/local/mailman-2.1.20/archives/private/mylistname/pipermail.pck > > Traceback (most recent call last): > > It would appear that you have non-ASCII character in the header of the > 214th message of December 2004 (or maybe it's the 214th message > overall). That message doesn't conform to the mail standards and > should be repaired. > > Since pipermail is constructing an index, I would guess that you have > a localized date header, a display name with an accented character in > it, or a subject with an accented character in it. The character in > question is e with a caret in the Latin-1 set, I don't know if that's > the intended character set though.
Looking at the mbox, there was only one place where \xea was in the header, in a Subject line, using `grep --color='auto' -P -n "\xea"`. I manually edited the mbox (making a copy first) and remove the accented-e character with an ASCII "e", and I'm still getting the error (I did this before e-mail the list). There are other places which have \xea, but not in any headers. The 214 is the message count from a state file. Every time I rerun the command the number is higher, but it seems to die in the same place. In the middle of the output we have a "UnicodeWarning": [...] #00104 <...@acm.org> figuring article archives 2005-September #00105 <...@mikep> figuring article archives 2005-September #00106 <...@acm.org> figuring article archives 2005-September #00107 <...@mail.gmail.com> figuring article archives 2005-October /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:176: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal self.dict = marshal.load(fp) /usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py:74: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being unequal self.sorted.sort() Updating index files for archive [2004-December] Date Subject Author Thread Computing threaded index Updating HTML for article 757 Updating HTML for article 864 Updating HTML for article 866 Updating index files for archive [2005-April] Date [...] Then the error at the end: [...] Updating index files for archive [2005-August] [...] Updating HTML for article 947 Updating HTML for article 840 Updating index files for archive [2005-September] Date Subject Author Thread Computing threaded index Updating HTML for article 841 Updating HTML for article 842 Updating HTML for article 843 Updating HTML for article 952 Updating HTML for article 845 Updating HTML for article 966 Updating HTML for article 846 Updating HTML for article 847 Updating HTML for article 848 Updating HTML for article 957 Updating HTML for article 958 Updating HTML for article 961 Updating HTML for article 962 Updating HTML for article 963 Updating HTML for article 964 Updating HTML for article 965 Updating HTML for article 851 Updating HTML for article 960 Updating HTML for article 861 Updating HTML for article 859 Updating HTML for article 860 Updating HTML for article 970 Pickling archive state into /usr/local/mailman-2.1.20/archives/private/reactome-help/pipermail.pck Traceback (most recent call last): File "bin/arch", line 201, in <module> main() File "bin/arch", line 189, in main archiver.processUnixMailbox(fp, start, end) File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line 586, in processUnixMailbox self.add_article(a) File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line 638, in add_article article.parentID = parentID = self.get_parent_info(arch, article) File "/usr/local/mailman-2.1.20/Mailman/Archiver/pipermail.py", line 658, in get_parent_info if self.database.hasArticle(archive, article.in_reply_to): File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line 279, in hasArticle self.__openIndices(archive) File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line 257, in __openIndices t = DumbBTree(os.path.join(arcdir, archive + '-' + i)) File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line 66, in __init__ self.load() File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line 185, in load self.__sort(dirty=1) File "/usr/local/mailman-2.1.20/Mailman/Archiver/HyperDatabase.py", line 74, in __sort self.sorted.sort() UnicodeDecodeError: 'ascii' codec can't decode byte 0xea in position 3: ordinal not in range(128) It gets to 2005-September, creates the index files, enumerates 22 articles, pickles the archive state, and then dies. The 23rd (and later) message/s don't appear to have non-ASCII. Can I patched pipermail.py or HyperDatabase.py (or ???) in some way to work around this? I have LANG=en_US.UTF-8 and LC_TIME=en_DK.UTF8 in my shell environment: does that make a difference? This used to work just fine, so I'm wonder what happened with the OS upgrade. I should have a copy of the VM pre-upgrade in case that's helpful. Thanks for the help. ------------------------------------------------------ Mailman-Users mailing list Mailman-Users@python.org https://mail.python.org/mailman/listinfo/mailman-users Mailman FAQ: http://wiki.list.org/x/AgA3 Security Policy: http://wiki.list.org/x/QIA9 Searchable Archives: http://www.mail-archive.com/mailman-users%40python.org/ Unsubscribe: https://mail.python.org/mailman/options/mailman-users/archive%40jab.org