This morning I had a similar issue, but with the email:cc: !binary encoding. 

Patching the subject line handling doesn’t fix the cc line.

I’m concerned that *any* UTF8 encoding of email fields will cause the same 
issue. 

Can we find the place where the !binary encoding is chosen instead of “normal” 
UTF8?

Thanks,

Craig

> On Aug 28, 2016, at 5:37 PM, Sam Ruby <[email protected]> wrote:
> 
> On Sun, Aug 28, 2016 at 8:04 PM, Craig Russell <[email protected]> 
> wrote:
>> Can you please take a look and see why the rescue didn’t work?
> 
> Logs can be found here:
> 
> https://whimsy.apache.org/members/log/
> 
> In particular, https://whimsy.apache.org/members/log/whimsy_error.log
> 
> What I am still seeing is:
> 
> _ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT
> to UTF-8>, referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> 
> And further up the stack traceback:
> 
> _WARN   
> /usr/local/rvm/gems/ruby-2.3.1/gems/mail-2.6.4/lib/mail/message.rb:1887:in
> `to_s', referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> _WARN   /x1/srv/whimsy/www/secretary/workbench/file.cgi:318:in `block
> in send_email', referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
> 
> So, you are not hitting the exception handler, and you are dying later
> when trying to convert the message (which includes a binary subject)
> into a string.
> 
> The reason why you are not hitting the exception handler is that you
> are not calling force_encoding.  A second problem is that if an
> exception were to be raised, you wouldn't be catching it as the
> exception needs to be qualified: Encoding::UndefinedConversionError
> 
>> Thanks,
>> 
>> Craig
> 
> - Sam Ruby
> 
>>> On Aug 28, 2016, at 4:30 PM, Sam Ruby <[email protected]> wrote:
>>> 
>>> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <[email protected]> 
>>> wrote:
>>>> I’m blind here. I can’t see the pending.yml. I can’t see the error 
>>>> console. I don’t even know if my change was pushed to production.
>>>> 
>>>> What tools do I need to see what’s going on?
>>> 
>>> What code is actually deployed can be seen on the last two lines of
>>> the status page: https://whimsy.apache.org/status/
>>> 
>>> Nothing in the (current) workbench shows the raw contents of
>>> pending.yml.  It would be easy to add as a new CGI script.  It could
>>> even be added as a new action in file.cgi.
>>> 
>>> Alternately, we could ask for you to be added to have shell access to
>>> whimsy-vm3.
>>> 
>>>> Thanks,
>>>> 
>>>> Craig
>>> 
>>> - Sam Ruby
>>> 
>>>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <[email protected]> 
>>>>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <[email protected]> wrote:
>>>>>> 
>>>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
>>>>>> <[email protected]> wrote:
>>>>>>> The processing of email::subject seems to be localized to file.cgi ca. 
>>>>>>> 261
>>>>>>> 
>>>>>>>       # override subject?
>>>>>>>       if vars.email_subject and !vars.email_subject.empty?
>>>>>>>         if vars.email_subject =~ /^re:\s/i
>>>>>>>           subject vars.email_subject
>>>>>>>         else
>>>>>>>           subject 'Re: ' + vars.email_subject
>>>>>>>         end
>>>>>>>       end
>>>>>>> 
>>>>>>> I can’t see where the actual problem is, but is there a way to either;
>>>>>>> 
>>>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 
>>>>>>> characters and pass them as characters instead of binary
>>>>>>> 
>>>>>>> 2. recognize that this has happened here and replace the subject with 
>>>>>>> an innocuous subject based on the document type.
>>>>>> 
>>>>>> All of your analysis seems to be on target.
>>>>>> 
>>>>>> This is from the log:
>>>>>> 
>>>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
>>>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
>>>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
>>>>>> UTF-8>, referer:
>>>>>> https://whimsy.apache.org/secretary/workbench/file.cgi
>>>>>> 
>>>>>> Looking at pending.yml with the interactive ruby shell:
>>>>>> 
>>>>>> $ irb
>>>>>> irb(main):001:0> require 'yaml'
>>>>>> => true
>>>>>> irb(main):002:0> pending = YAML.load_file('pending.yml')
>>>>>> => [{"doctype"=>"icla",
>>>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
>>>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
>>>>>> "email"=>"[email protected]", "filename"=>"heorhi-arynich.pdf",
>>>>>> "nname"=>"Gosha Arinich", "nemail"=>"[email protected]",
>>>>>> "iname"=>"Gosha Arinich", "iemail"=>"[email protected]",
>>>>>> "uname"=>"Gosha Arinich", "uemail"=>"[email protected]",
>>>>>> "pname"=>"Gosha Arinich", "pemail"=>"[email protected]",
>>>>>> "memail"=>"[email protected]", "gname"=>"Gosha Arinich",
>>>>>> "gemail"=>"[email protected]", "contact"=>"Gosha Arinich",
>>>>>> "cemail"=>"[email protected]", "ipodling"=>" ",
>>>>>> "email:addr"=>"[email protected]",
>>>>>> "email:id"=>"<ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>",
>>>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
>>>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
>>>>>> irb(main):003:0> pending.first['email:subject']
>>>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
>>>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
>>>>>> => "ICLA — Gosha Arinich aka goshakkk"
>>>>>> 
>>>>>> Not surprising given the torturous path that the subject goes through
>>>>>> in the current workbench implementation.  A cron job extracts the
>>>>>> subject line from the email using python libraries and puts it into a
>>>>>> svn property associated with the file.  The workbench then uses the
>>>>>> command line to extract that property and parses the output from the
>>>>>> command.  What is surprising is that if there is an error in handling
>>>>>> non-ASCII characters why it hasn't shown up before and more
>>>>>> frequently.  I'm pretty sure that non-ASCII characters have been seen
>>>>>> before, and I'm not sure what is different about this email.
>>>>> 
>>>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen 
>>>>> one in the triple-character UTF8 representation.
>>>>>> 
>>>>>> In any case, suggested fixes:
>>>>>> 
>>>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if
>>>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if
>>>>>> statement.  It should be harmless in cases that currently work, and
>>>>>> should fix this case.  In cases where the data is binary data that
>>>>>> can't be interpreted as utf-8, it will continue to blow up.
>>>>>> 
>>>>>> 2) add 'begin...rescue...end' around the inner if statement.  Note:
>>>>>> you don't need to set subject in the rescue clause as it was set by
>>>>>> the relevant erb file (e.g. icla.erb).  More information on rescue
>>>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>>>>>> 
>>>>>> These changes should enable you to process the currently pending action.
>>>>> 
>>>>> Now waiting for deployment…
>>>>> 
>>>>> Craig
>>>>> 
>>>>>> 
>>>>>>> Craig
>>>>>> 
>>>>>> - Sam Ruby
>>>>>> 
>>>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>>>>>> 
>>>>>>>> ---
>>>>>>>> - doctype: icla
>>>>>>>> source: craig-russell-copy.pdf
>>>>>>>> realname: Craig Russell Emdash
>>>>>>>> pubname: Craig Russell Emdash
>>>>>>>> email: [email protected]
>>>>>>>> filename: craig-russell-emdash.pdf
>>>>>>>> nname: Craig Russell
>>>>>>>> nemail: [email protected]
>>>>>>>> iname: Craig Russell
>>>>>>>> iemail: [email protected]
>>>>>>>> uname: Craig Russell
>>>>>>>> uemail: [email protected]
>>>>>>>> pname: Craig Russell
>>>>>>>> pemail: [email protected]
>>>>>>>> memail: [email protected]
>>>>>>>> gname: Craig Russell
>>>>>>>> gemail: [email protected]
>>>>>>>> contact: Craig Russell
>>>>>>>> cemail: [email protected]
>>>>>>>> ipodling: " "
>>>>>>>> email:addr: [email protected]
>>>>>>>> email:id: "<[email protected]>"
>>>>>>>> email:name: Craig Russell
>>>>>>>> email:subject: !binary |-
>>>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>>>>>>> svn:mime-type: application/pdf
>>>>>>>> 
>>>>>>>> 
>>>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell 
>>>>>>>>> <[email protected]> wrote:
>>>>>>>>> 
>>>>>>>>> This email causes (still pending email) an error sending mail.
>>>>>>>>> 
>>>>>>>>> I suspect it is because of the em-dash in the subject.
>>>>>>>>> 
>>>>>>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>>>>>> 
>>>>>>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>>>>>>> Message-ID: 
>>>>>>>>> <ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com 
>>>>>>>>> <mailto:ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>>
>>>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>>>>>>> To: [email protected] <mailto:[email protected]>
>>>>>>>>> 
>>>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to 
>>>>>>>>> be fixed.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> 
>>>>>>>>> Craig
>>>>>>>>> 
>>>>>>>>>> Begin forwarded message:
>>>>>>>>>> 
>>>>>>>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>>>>>>> To: [email protected] <mailto:[email protected]>
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> Cheers,
>>>>>>>>>> Gosha
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Craig L Russell
>>>>>>>>> Secretary, Apache Software Foundation
>>>>>>>>> [email protected] <mailto:[email protected]> http://db.apache.org/jdo 
>>>>>>>>> <http://db.apache.org/jdo>
>>>>>>>> 
>>>>>>>> Craig L Russell
>>>>>>>> Architect
>>>>>>>> [email protected]
>>>>>>>> P.S <mailto:[email protected]>. A good JDO? O, Gasp!
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>>> Craig L Russell
>>>>>>> Architect
>>>>>>> [email protected]
>>>>>>> P.S. A good JDO? O, Gasp!
>>>>> 
>>>>> Craig L Russell
>>>>> Architect
>>>>> [email protected]
>>>>> P.S. A good JDO? O, Gasp!
>>>> 
>>>> Craig L Russell
>>>> Architect
>>>> [email protected]
>>>> P.S. A good JDO? O, Gasp!
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>> Craig L Russell
>> Architect
>> [email protected]
>> P.S. A good JDO? O, Gasp!
>> 
>> 
>> 
>> 
>> 

Craig L Russell
Architect
[email protected]
P.S. A good JDO? O, Gasp!





Reply via email to