On 2016-07-13 04:53 (+0200), Gav <[email protected]> wrote: 
> Hi,
> 
> When running :-
> 
> python3 import-mbox.py --source https://mail-archives.apache.org/mod_mbox/
> --mod-mbox --project httpd
> 
> I get what seems to be all lists being slurped k, the output ends with:-
> 
> ...
> Parsed httpd-announce/201511.mbox: 9 records from
> 310ede37d0739990f3e1338778ae6f2e5b31117916caa2f1469651a8
> 2015 elements left to slurp
> Slurping httpd-announce/201510.mbox
> Found attachment: Notice_to_Appear_00000681680.zip
> Found attachment: Court_Notification_00000155647.zip
> Found attachment: 00174586.zip
> Date seems totally wrong, setting to _now_ instead.
> Exception in thread Thread-1:
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
>     self.run()
>   File "import-mbox.py", line 263, in run
>     json, contents = foo.compute_updates(list_override, private, message)
>   File "/var/www/incubator-ponymail/tools/archiver.py", line 274, in
> compute_updates
>     mdatestring = time.strftime("%Y/%m/%d %H:%M:%S",
> time.gmtime(email.utils.mktime_tz(mdate)))
>   File "/usr/lib/python3.5/email/_parseaddr.py", line 185, in mktime_tz
>     if data[9] is None:
> IndexError: tuple index out of range

This appears to be something that happens when emails violate the RFC (missing 
Date header). Pony Mail then tries to fake it, but ends up with a 9-tuple 
instead of a 10-tuple, which mktime_tz requires.

I've pushed a change to master which appends a 10th element to the 9-tuple, 
making the date formatter work again. Please try using archiver.py from master 
and report back if that fixes your issue :)

Ideally, emails should follow protocol, but these specific ones (which appear 
to be scam/malware) certainly don't.

> 
> All done! 54 records inserted/updated after 67 seconds. 0 records were bad
> and ignored
> 
> 
> Any ideas?
> 
> FYI I set the VM TZ to UTC with no effect.
> 
> Gav...
> 
> ...
> 

Reply via email to