On 22/11/2016 10:16, sebb wrote:
Sorry about that, I decided to change the thread id to its name and
did not change all the references.
Should be OK now.

Yes, I confirm it is (getting the original exception).

Going back to the original encoding issue: I have tried and failed to
reproduce it.

Can you find out which mbox caused the problem so I can take a look?

I know which mbox is causing the problem, but it's a private mailing list, so I'd rather be safer to extract the troublesome message into a separate mbox, possibly by changing some bits to avoid unwanted disclosures.

Is there an easy way to add some debug statement about which message is actually the one causing troubles?

FYI at the moment the stacktrace is

Traceback (most recent call last):
  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "import-mbox.py", line 295, in run
    'source': message.as_string()
  File "/usr/lib/python3.5/email/message.py", line 159, in as_string
    g.flatten(self, unixfrom=unixfrom)
  File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
    self._write(msg)
  File "/usr/lib/python3.5/email/generator.py", line 181, in _write
    self._dispatch(msg)
  File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
    meth(msg)
  File "/usr/lib/python3.5/email/generator.py", line 243, in _handle_text
    msg.set_payload(payload, charset)
  File "/usr/lib/python3.5/email/message.py", line 316, in set_payload
    payload = payload.encode(charset.output_charset)
UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in position 3657: ordinal not in range(128)

All done! 0 records inserted/updated after 19 seconds. 0 records were bad and ignored

Regards.

On 22 November 2016 at 07:23, Francesco Chicchiriccò<[email protected]> wrote:
Hi all,
after latest commits, I get now the following error when importing from
mbox:

Exception in thread Thread-1:
Traceback (most recent call last):
   File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner
     self.run()
   File "import-mbox.py", line 314, in run
     bulk.assign(self.id, ja, es, 'mbox')
AttributeError: 'SlurpThread' object has no attribute 'id'

Regards.


On 21/11/2016 17:19, sebb wrote:
On 21 November 2016 at 11:52, Daniel Gruno <[email protected]> wrote:
On 11/21/2016 12:50 PM, sebb wrote:
On 21 November 2016 at 11:40, Francesco Chicchiriccò
<[email protected]> wrote:
Hi all,
not sure but it seems that the commit below broke my scheduled import
from mbox:
It won't be that commit, most likely the fix for #251


https://github.com/apache/incubator-ponymail/commit/1a3bff403166c917738fd02acefc988b909d4eae#diff-0102373f79eaa72ffaff3ce7675b6a43

This presumably means the archiver would have fallen over with the same
e-mail.
Or there is an encoding problem with writing the mail to the mbox - or
reading it - so the importer is not seeing the same input as the
archiver.
The importer usually sees things as ASCII, whereas the archiver _can_
get fed input as unicode by postfix (I don't know why, but there it is).
This may explain why. I think as_bytes is a safer way to archive, as
it's binary.
That all depends how the binary is generated.
As far as I can tell, the parsed message is not stored as binary, so
it has to be encoded to create the bytes.

It would be useful to know what the message is that causes the issue.

If you can find it I can take a look later.

Exception in thread Thread-1:
Traceback (most recent call last):
    File "/usr/lib/python3.5/threading.py", line 914, in
_bootstrap_inner
      self.run()
    File "import-mbox.py", line 297, in run
      'source': message.as_string()
    File "/usr/lib/python3.5/email/message.py", line 159, in as_string
      g.flatten(self, unixfrom=unixfrom)
    File "/usr/lib/python3.5/email/generator.py", line 115, in flatten
      self._write(msg)
    File "/usr/lib/python3.5/email/generator.py", line 181, in _write
      self._dispatch(msg)
    File "/usr/lib/python3.5/email/generator.py", line 214, in _dispatch
      meth(msg)
    File "/usr/lib/python3.5/email/generator.py", line 243, in
_handle_text
      msg.set_payload(payload, charset)
    File "/usr/lib/python3.5/email/message.py", line 316, in set_payload
      payload = payload.encode(charset.output_charset)
UnicodeEncodeError: 'ascii' codec can't encode character '\ufffd' in
position 3657: ordinal not in range(128)

Any hint / workaround?

--
Francesco Chicchiriccò

Tirasa - Open Source Excellence
http://www.tirasa.net/

Member at The Apache Software Foundation
Syncope, Cocoon, Olingo, CXF, OpenJPA, PonyMail
http://home.apache.org/~ilgrosso/

Reply via email to