This morning I had a similar issue, but with the email:cc: !binary encoding.
Patching the subject line handling doesn’t fix the cc line. I’m concerned that *any* UTF8 encoding of email fields will cause the same issue. Can we find the place where the !binary encoding is chosen instead of “normal” UTF8? Thanks, Craig > On Aug 28, 2016, at 5:37 PM, Sam Ruby <[email protected]> wrote: > > On Sun, Aug 28, 2016 at 8:04 PM, Craig Russell <[email protected]> > wrote: >> Can you please take a look and see why the rescue didn’t work? > > Logs can be found here: > > https://whimsy.apache.org/members/log/ > > In particular, https://whimsy.apache.org/members/log/whimsy_error.log > > What I am still seeing is: > > _ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT > to UTF-8>, referer: > https://whimsy.apache.org/secretary/workbench/file.cgi > > And further up the stack traceback: > > _WARN > /usr/local/rvm/gems/ruby-2.3.1/gems/mail-2.6.4/lib/mail/message.rb:1887:in > `to_s', referer: > https://whimsy.apache.org/secretary/workbench/file.cgi > _WARN /x1/srv/whimsy/www/secretary/workbench/file.cgi:318:in `block > in send_email', referer: > https://whimsy.apache.org/secretary/workbench/file.cgi > > So, you are not hitting the exception handler, and you are dying later > when trying to convert the message (which includes a binary subject) > into a string. > > The reason why you are not hitting the exception handler is that you > are not calling force_encoding. A second problem is that if an > exception were to be raised, you wouldn't be catching it as the > exception needs to be qualified: Encoding::UndefinedConversionError > >> Thanks, >> >> Craig > > - Sam Ruby > >>> On Aug 28, 2016, at 4:30 PM, Sam Ruby <[email protected]> wrote: >>> >>> On Sun, Aug 28, 2016 at 6:15 PM, Craig Russell <[email protected]> >>> wrote: >>>> I’m blind here. I can’t see the pending.yml. I can’t see the error >>>> console. I don’t even know if my change was pushed to production. >>>> >>>> What tools do I need to see what’s going on? >>> >>> What code is actually deployed can be seen on the last two lines of >>> the status page: https://whimsy.apache.org/status/ >>> >>> Nothing in the (current) workbench shows the raw contents of >>> pending.yml. It would be easy to add as a new CGI script. It could >>> even be added as a new action in file.cgi. >>> >>> Alternately, we could ask for you to be added to have shell access to >>> whimsy-vm3. >>> >>>> Thanks, >>>> >>>> Craig >>> >>> - Sam Ruby >>> >>>>> On Aug 28, 2016, at 2:30 PM, Craig Russell <[email protected]> >>>>> wrote: >>>>> >>>>>> >>>>>> On Aug 28, 2016, at 6:04 AM, Sam Ruby <[email protected]> wrote: >>>>>> >>>>>> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell >>>>>> <[email protected]> wrote: >>>>>>> The processing of email::subject seems to be localized to file.cgi ca. >>>>>>> 261 >>>>>>> >>>>>>> # override subject? >>>>>>> if vars.email_subject and !vars.email_subject.empty? >>>>>>> if vars.email_subject =~ /^re:\s/i >>>>>>> subject vars.email_subject >>>>>>> else >>>>>>> subject 'Re: ' + vars.email_subject >>>>>>> end >>>>>>> end >>>>>>> >>>>>>> I can’t see where the actual problem is, but is there a way to either; >>>>>>> >>>>>>> 1. have whichever component created vars.email_subject recognize UTF-8 >>>>>>> characters and pass them as characters instead of binary >>>>>>> >>>>>>> 2. recognize that this has happened here and replace the subject with >>>>>>> an innocuous subject based on the document type. >>>>>> >>>>>> All of your analysis seems to be on target. >>>>>> >>>>>> This is from the log: >>>>>> >>>>>> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid >>>>>> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR >>>>>> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to >>>>>> UTF-8>, referer: >>>>>> https://whimsy.apache.org/secretary/workbench/file.cgi >>>>>> >>>>>> Looking at pending.yml with the interactive ruby shell: >>>>>> >>>>>> $ irb >>>>>> irb(main):001:0> require 'yaml' >>>>>> => true >>>>>> irb(main):002:0> pending = YAML.load_file('pending.yml') >>>>>> => [{"doctype"=>"icla", >>>>>> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf", >>>>>> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich", >>>>>> "email"=>"[email protected]", "filename"=>"heorhi-arynich.pdf", >>>>>> "nname"=>"Gosha Arinich", "nemail"=>"[email protected]", >>>>>> "iname"=>"Gosha Arinich", "iemail"=>"[email protected]", >>>>>> "uname"=>"Gosha Arinich", "uemail"=>"[email protected]", >>>>>> "pname"=>"Gosha Arinich", "pemail"=>"[email protected]", >>>>>> "memail"=>"[email protected]", "gname"=>"Gosha Arinich", >>>>>> "gemail"=>"[email protected]", "contact"=>"Gosha Arinich", >>>>>> "cemail"=>"[email protected]", "ipodling"=>" ", >>>>>> "email:addr"=>"[email protected]", >>>>>> "email:id"=>"<ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>", >>>>>> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94 >>>>>> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}] >>>>>> irb(main):003:0> pending.first['email:subject'] >>>>>> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk" >>>>>> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8') >>>>>> => "ICLA — Gosha Arinich aka goshakkk" >>>>>> >>>>>> Not surprising given the torturous path that the subject goes through >>>>>> in the current workbench implementation. A cron job extracts the >>>>>> subject line from the email using python libraries and puts it into a >>>>>> svn property associated with the file. The workbench then uses the >>>>>> command line to extract that property and parses the output from the >>>>>> command. What is surprising is that if there is an error in handling >>>>>> non-ASCII characters why it hasn't shown up before and more >>>>>> frequently. I'm pretty sure that non-ASCII characters have been seen >>>>>> before, and I'm not sure what is different about this email. >>>>> >>>>> I’ve seen plenty of non-ASCII characters but this is the first I’ve seen >>>>> one in the triple-character UTF8 representation. >>>>>> >>>>>> In any case, suggested fixes: >>>>>> >>>>>> 1) add "'vars.email_subject.force_encoding('utf-8') if >>>>>> vars.email_subject.encoding == Encoding::BINARY" before the inner if >>>>>> statement. It should be harmless in cases that currently work, and >>>>>> should fix this case. In cases where the data is binary data that >>>>>> can't be interpreted as utf-8, it will continue to blow up. >>>>>> >>>>>> 2) add 'begin...rescue...end' around the inner if statement. Note: >>>>>> you don't need to set subject in the rescue clause as it was set by >>>>>> the relevant erb file (e.g. icla.erb). More information on rescue >>>>>> statements: http://phrogz.net/programmingruby/tut_exceptions.html >>>>>> >>>>>> These changes should enable you to process the currently pending action. >>>>> >>>>> Now waiting for deployment… >>>>> >>>>> Craig >>>>> >>>>>> >>>>>>> Craig >>>>>> >>>>>> - Sam Ruby >>>>>> >>>>>>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>> Here’s what happens to the em-dash in whimsy pending.yml: >>>>>>>> >>>>>>>> --- >>>>>>>> - doctype: icla >>>>>>>> source: craig-russell-copy.pdf >>>>>>>> realname: Craig Russell Emdash >>>>>>>> pubname: Craig Russell Emdash >>>>>>>> email: [email protected] >>>>>>>> filename: craig-russell-emdash.pdf >>>>>>>> nname: Craig Russell >>>>>>>> nemail: [email protected] >>>>>>>> iname: Craig Russell >>>>>>>> iemail: [email protected] >>>>>>>> uname: Craig Russell >>>>>>>> uemail: [email protected] >>>>>>>> pname: Craig Russell >>>>>>>> pemail: [email protected] >>>>>>>> memail: [email protected] >>>>>>>> gname: Craig Russell >>>>>>>> gemail: [email protected] >>>>>>>> contact: Craig Russell >>>>>>>> cemail: [email protected] >>>>>>>> ipodling: " " >>>>>>>> email:addr: [email protected] >>>>>>>> email:id: "<[email protected]>" >>>>>>>> email:name: Craig Russell >>>>>>>> email:subject: !binary |- >>>>>>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg >>>>>>>> svn:mime-type: application/pdf >>>>>>>> >>>>>>>> >>>>>>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell >>>>>>>>> <[email protected]> wrote: >>>>>>>>> >>>>>>>>> This email causes (still pending email) an error sending mail. >>>>>>>>> >>>>>>>>> I suspect it is because of the em-dash in the subject. >>>>>>>>> >>>>>>>>> I don’t know how to look at or edit the pending.yml on the server. >>>>>>>>> >>>>>>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>> >>>>>>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300 >>>>>>>>> Message-ID: >>>>>>>>> <ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com >>>>>>>>> <mailto:ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>> >>>>>>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?= >>>>>>>>> To: [email protected] <mailto:[email protected]> >>>>>>>>> >>>>>>>>> So, two issues: the pending mail needs to be sent; the bug needs to >>>>>>>>> be fixed. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> >>>>>>>>> Craig >>>>>>>>> >>>>>>>>>> Begin forwarded message: >>>>>>>>>> >>>>>>>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>> >>>>>>>>>> Subject: ICLA — Gosha Arinich aka goshakkk >>>>>>>>>> Date: August 26, 2016 at 5:03:00 PM PDT >>>>>>>>>> To: [email protected] <mailto:[email protected]> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Cheers, >>>>>>>>>> Gosha >>>>>>>>>> >>>>>>>>> >>>>>>>>> Craig L Russell >>>>>>>>> Secretary, Apache Software Foundation >>>>>>>>> [email protected] <mailto:[email protected]> http://db.apache.org/jdo >>>>>>>>> <http://db.apache.org/jdo> >>>>>>>> >>>>>>>> Craig L Russell >>>>>>>> Architect >>>>>>>> [email protected] >>>>>>>> P.S <mailto:[email protected]>. A good JDO? O, Gasp! >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> Craig L Russell >>>>>>> Architect >>>>>>> [email protected] >>>>>>> P.S. A good JDO? O, Gasp! >>>>> >>>>> Craig L Russell >>>>> Architect >>>>> [email protected] >>>>> P.S. A good JDO? O, Gasp! >>>> >>>> Craig L Russell >>>> Architect >>>> [email protected] >>>> P.S. A good JDO? O, Gasp! >>>> >>>> >>>> >>>> >>>> >> >> Craig L Russell >> Architect >> [email protected] >> P.S. A good JDO? O, Gasp! >> >> >> >> >> Craig L Russell Architect [email protected] P.S. A good JDO? O, Gasp!
