On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell <[email protected]> wrote: > The processing of email::subject seems to be localized to file.cgi ca. 261 > > # override subject? > if vars.email_subject and !vars.email_subject.empty? > if vars.email_subject =~ /^re:\s/i > subject vars.email_subject > else > subject 'Re: ' + vars.email_subject > end > end > > I can’t see where the actual problem is, but is there a way to either; > > 1. have whichever component created vars.email_subject recognize UTF-8 > characters and pass them as characters instead of binary > > 2. recognize that this has happened here and replace the subject with an > innocuous subject based on the document type.
All of your analysis seems to be on target. This is from the log: [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to UTF-8>, referer: https://whimsy.apache.org/secretary/workbench/file.cgi Looking at pending.yml with the interactive ruby shell: $ irb irb(main):001:0> require 'yaml' => true irb(main):002:0> pending = YAML.load_file('pending.yml') => [{"doctype"=>"icla", "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf", "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich", "email"=>"[email protected]", "filename"=>"heorhi-arynich.pdf", "nname"=>"Gosha Arinich", "nemail"=>"[email protected]", "iname"=>"Gosha Arinich", "iemail"=>"[email protected]", "uname"=>"Gosha Arinich", "uemail"=>"[email protected]", "pname"=>"Gosha Arinich", "pemail"=>"[email protected]", "memail"=>"[email protected]", "gname"=>"Gosha Arinich", "gemail"=>"[email protected]", "contact"=>"Gosha Arinich", "cemail"=>"[email protected]", "ipodling"=>" ", "email:addr"=>"[email protected]", "email:id"=>"<ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>", "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}] irb(main):003:0> pending.first['email:subject'] => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk" irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8') => "ICLA — Gosha Arinich aka goshakkk" Not surprising given the torturous path that the subject goes through in the current workbench implementation. A cron job extracts the subject line from the email using python libraries and puts it into a svn property associated with the file. The workbench then uses the command line to extract that property and parses the output from the command. What is surprising is that if there is an error in handling non-ASCII characters why it hasn't shown up before and more frequently. I'm pretty sure that non-ASCII characters have been seen before, and I'm not sure what is different about this email. In any case, suggested fixes: 1) add "'vars.email_subject.force_encoding('utf-8') if vars.email_subject.encoding == Encoding::BINARY" before the inner if statement. It should be harmless in cases that currently work, and should fix this case. In cases where the data is binary data that can't be interpreted as utf-8, it will continue to blow up. 2) add 'begin...rescue...end' around the inner if statement. Note: you don't need to set subject in the rescue clause as it was set by the relevant erb file (e.g. icla.erb). More information on rescue statements: http://phrogz.net/programmingruby/tut_exceptions.html These changes should enable you to process the currently pending action. > Craig - Sam Ruby >> On Aug 27, 2016, at 12:11 PM, Craig Russell <[email protected]> wrote: >> >> Here’s what happens to the em-dash in whimsy pending.yml: >> >> --- >> - doctype: icla >> source: craig-russell-copy.pdf >> realname: Craig Russell Emdash >> pubname: Craig Russell Emdash >> email: [email protected] >> filename: craig-russell-emdash.pdf >> nname: Craig Russell >> nemail: [email protected] >> iname: Craig Russell >> iemail: [email protected] >> uname: Craig Russell >> uemail: [email protected] >> pname: Craig Russell >> pemail: [email protected] >> memail: [email protected] >> gname: Craig Russell >> gemail: [email protected] >> contact: Craig Russell >> cemail: [email protected] >> ipodling: " " >> email:addr: [email protected] >> email:id: "<[email protected]>" >> email:name: Craig Russell >> email:subject: !binary |- >> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg >> svn:mime-type: application/pdf >> >> >>> On Aug 27, 2016, at 11:41 AM, Craig Russell <[email protected]> >>> wrote: >>> >>> This email causes (still pending email) an error sending mail. >>> >>> I suspect it is because of the em-dash in the subject. >>> >>> I don’t know how to look at or edit the pending.yml on the server. >>> >>> From: Gosha Arinich <[email protected] <mailto:[email protected]>> >>> Date: Sat, 27 Aug 2016 03:03:00 +0300 >>> Message-ID: >>> <ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com >>> <mailto:ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>> >>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?= >>> To: [email protected] <mailto:[email protected]> >>> >>> So, two issues: the pending mail needs to be sent; the bug needs to be >>> fixed. >>> >>> Thanks, >>> >>> Craig >>> >>>> Begin forwarded message: >>>> >>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>> >>>> Subject: ICLA — Gosha Arinich aka goshakkk >>>> Date: August 26, 2016 at 5:03:00 PM PDT >>>> To: [email protected] <mailto:[email protected]> >>>> >>>> >>>> >>>> -- >>>> Cheers, >>>> Gosha >>>> >>> >>> Craig L Russell >>> Secretary, Apache Software Foundation >>> [email protected] <mailto:[email protected]> http://db.apache.org/jdo >>> <http://db.apache.org/jdo> >> >> Craig L Russell >> Architect >> [email protected] >> P.S <mailto:[email protected]>. A good JDO? O, Gasp! >> >> >> >> >> > > Craig L Russell > Architect > [email protected] > P.S. A good JDO? O, Gasp! > > > > >
