OK, I've added a basic error report.

Note: I've since found the spamassassin e-mail corpus, and a couple of
the easy_ham mails look as though they have the same problem.

I'm about to start investigastions.


On 22 November 2016 at 12:46, Francesco Chicchiriccò
<[email protected]> wrote:
> On 22/11/2016 10:16, sebb wrote:
>>
>> Sorry about that, I decided to change the thread id to its name and
>> did not change all the references.
>> Should be OK now.
>
>
> Yes, I confirm it is (getting the original exception).
>
>> Going back to the original encoding issue: I have tried and failed to
>> reproduce it.
>>
>> Can you find out which mbox caused the problem so I can take a look?
>
>
> I know which mbox is causing the problem, but it's a private mailing list,
> so I'd rather be safer to extract the troublesome message into a separate
> mbox, possibly by changing some bits to avoid unwanted disclosures.
>
> Is there an easy way to add some debug statement about which message is
> actually the one causing troubles?
>
> FYI at the moment the stacktrace is
>
> Traceback (most recent call last):
>   File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
>     self.run()
>   File "import-mbox.py", line 295, in run
>     'source': message.as_string()
>   File "/usr/lib/python3.5/email/message.py", line 159, in as_string
>     g.flatten(self, unixfrom=unixfrom)
>   File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
>     self._write(msg)
>   File "/usr/lib/python3.5/email/generator.py", line 181, in _write
>     self._dispatch(msg)
>   File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
>     meth(msg)
>   File "/usr/lib/python3.5/email/generator.py", line 243, in _handle_text
>     msg.set_payload(payload, charset)
>   File "/usr/lib/python3.5/email/message.py", line 316, in set_payload
>     payload = payload.encode(charset.output_charset)
> UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in
> position 3657: ordinal not in range(128)
>
> All done! 0 records inserted/updated after 19 seconds. 0 records were bad
> and ignored
>
> Regards.
>
>
>> On 22 November 2016 at 07:23, Francesco Chicchiriccò<[email protected]>
>> wrote:
>>>
>>> Hi all,
>>> after latest commits, I get now the following error when importing from
>>> mbox:
>>>
>>> Exception in thread Thread-1:
>>> Traceback (most recent call last):
>>>    File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
>>>      self.run()
>>>    File "import-mbox.py", line 314, in run
>>>      bulk.assign(self.id, ja, es, 'mbox')
>>> AttributeError: 'SlurpThread' object has no attribute 'id'
>>>
>>> Regards.
>>>
>>>
>>> On 21/11/2016 17:19, sebb wrote:
>>>>
>>>> On 21 November 2016 at 11:52, Daniel Gruno <[email protected]> wrote:
>>>>>
>>>>> On 11/21/2016 12:50 PM, sebb wrote:
>>>>>>
>>>>>> On 21 November 2016 at 11:40, Francesco Chicchiriccò
>>>>>> <[email protected]> wrote:
>>>>>>>
>>>>>>> Hi all,
>>>>>>> not sure but it seems that the commit below broke my scheduled import
>>>>>>> from mbox:
>>>>>>
>>>>>> It won't be that commit, most likely the fix for #251
>>>>>>
>>>>>>
>>>>>>
>>>>>> https://github.com/apache/incubator-ponymail/commit/1a3bff403166c917738fd02acefc988b909d4eae#diff-0102373f79eaa72ffaff3ce7675b6a43
>>>>>>
>>>>>> This presumably means the archiver would have fallen over with the
>>>>>> same
>>>>>> e-mail.
>>>>>> Or there is an encoding problem with writing the mail to the mbox - or
>>>>>> reading it - so the importer is not seeing the same input as the
>>>>>> archiver.
>>>>>
>>>>> The importer usually sees things as ASCII, whereas the archiver _can_
>>>>> get fed input as unicode by postfix (I don't know why, but there it
>>>>> is).
>>>>> This may explain why. I think as_bytes is a safer way to archive, as
>>>>> it's binary.
>>>>
>>>> That all depends how the binary is generated.
>>>> As far as I can tell, the parsed message is not stored as binary, so
>>>> it has to be encoded to create the bytes.
>>>>
>>>>>> It would be useful to know what the message is that causes the issue.
>>>>>>
>>>>>> If you can find it I can take a look later.
>>>>>>
>>>>>>> Exception in thread Thread-1:
>>>>>>> Traceback (most recent call last):
>>>>>>>     File "/usr/lib/python3.5/threading.py", line 914, in
>>>>>>> _bootstrap_inner
>>>>>>>       self.run()
>>>>>>>     File "import-mbox.py", line 297, in run
>>>>>>>       'source': message.as_string()
>>>>>>>     File "/usr/lib/python3.5/email/message.py", line 159, in
>>>>>>> as_string
>>>>>>>       g.flatten(self, unixfrom=unixfrom)
>>>>>>>     File "/usr/lib/python3.5/email/generator.py", line 115, in
>>>>>>> flatten
>>>>>>>       self._write(msg)
>>>>>>>     File "/usr/lib/python3.5/email/generator.py", line 181, in _write
>>>>>>>       self._dispatch(msg)
>>>>>>>     File "/usr/lib/python3.5/email/generator.py", line 214, in
>>>>>>> _dispatch
>>>>>>>       meth(msg)
>>>>>>>     File "/usr/lib/python3.5/email/generator.py", line 243, in
>>>>>>> _handle_text
>>>>>>>       msg.set_payload(payload, charset)
>>>>>>>     File "/usr/lib/python3.5/email/message.py", line 316, in
>>>>>>> set_payload
>>>>>>>       payload = payload.encode(charset.output_charset)
>>>>>>> UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in
>>>>>>> position 3657: ordinal not in range(128)
>>>>>>>
>>>>>>> Any hint / workaround?
>
>
> --
> Francesco Chicchiriccò
>
> Tirasa - Open Source Excellence
> http://www.tirasa.net/
>
> Member at The Apache Software Foundation
> Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
> http://home.apache.org/~ilgrosso/
>

Reply via email to