> On Aug 28, 2016, at 6:04 AM, Sam Ruby <[email protected]> wrote:
>
> On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
> <[email protected]> wrote:
>> The processing of email::subject seems to be localized to file.cgi ca. 261
>>
>> # override subject?
>> if vars.email_subject and !vars.email_subject.empty?
>> if vars.email_subject =~ /^re:\s/i
>> subject vars.email_subject
>> else
>> subject 'Re: ' + vars.email_subject
>> end
>> end
>>
>> I can’t see where the actual problem is, but is there a way to either;
>>
>> 1. have whichever component created vars.email_subject recognize UTF-8
>> characters and pass them as characters instead of binary
>>
>> 2. recognize that this has happened here and replace the subject with an
>> innocuous subject based on the document type.
>
> All of your analysis seems to be on target.
>
> This is from the log:
>
> [Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
> 139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
> #<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
> UTF-8>, referer:
> https://whimsy.apache.org/secretary/workbench/file.cgi
>
> Looking at pending.yml with the interactive ruby shell:
>
> $ irb
> irb(main):001:0> require 'yaml'
> => true
> irb(main):002:0> pending = YAML.load_file('pending.yml')
> => [{"doctype"=>"icla",
> "source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
> "realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
> "email"=>"[email protected]", "filename"=>"heorhi-arynich.pdf",
> "nname"=>"Gosha Arinich", "nemail"=>"[email protected]",
> "iname"=>"Gosha Arinich", "iemail"=>"[email protected]",
> "uname"=>"Gosha Arinich", "uemail"=>"[email protected]",
> "pname"=>"Gosha Arinich", "pemail"=>"[email protected]",
> "memail"=>"[email protected]", "gname"=>"Gosha Arinich",
> "gemail"=>"[email protected]", "contact"=>"Gosha Arinich",
> "cemail"=>"[email protected]", "ipodling"=>" ",
> "email:addr"=>"[email protected]",
> "email:id"=>"<ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>",
> "email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
> Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
> irb(main):003:0> pending.first['email:subject']
> => "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
> irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
> => "ICLA — Gosha Arinich aka goshakkk"
>
> Not surprising given the torturous path that the subject goes through
> in the current workbench implementation. A cron job extracts the
> subject line from the email using python libraries and puts it into a
> svn property associated with the file. The workbench then uses the
> command line to extract that property and parses the output from the
> command. What is surprising is that if there is an error in handling
> non-ASCII characters why it hasn't shown up before and more
> frequently. I'm pretty sure that non-ASCII characters have been seen
> before, and I'm not sure what is different about this email.
I’ve seen plenty of non-ASCII characters but this is the first I’ve seen one in
the triple-character UTF8 representation.
>
> In any case, suggested fixes:
>
> 1) add "'vars.email_subject.force_encoding('utf-8') if
> vars.email_subject.encoding == Encoding::BINARY" before the inner if
> statement. It should be harmless in cases that currently work, and
> should fix this case. In cases where the data is binary data that
> can't be interpreted as utf-8, it will continue to blow up.
>
> 2) add 'begin...rescue...end' around the inner if statement. Note:
> you don't need to set subject in the rescue clause as it was set by
> the relevant erb file (e.g. icla.erb). More information on rescue
> statements: http://phrogz.net/programmingruby/tut_exceptions.html
>
> These changes should enable you to process the currently pending action.
Now waiting for deployment…
Craig
>
>> Craig
>
> - Sam Ruby
>
>>> On Aug 27, 2016, at 12:11 PM, Craig Russell <[email protected]>
>>> wrote:
>>>
>>> Here’s what happens to the em-dash in whimsy pending.yml:
>>>
>>> ---
>>> - doctype: icla
>>> source: craig-russell-copy.pdf
>>> realname: Craig Russell Emdash
>>> pubname: Craig Russell Emdash
>>> email: [email protected]
>>> filename: craig-russell-emdash.pdf
>>> nname: Craig Russell
>>> nemail: [email protected]
>>> iname: Craig Russell
>>> iemail: [email protected]
>>> uname: Craig Russell
>>> uemail: [email protected]
>>> pname: Craig Russell
>>> pemail: [email protected]
>>> memail: [email protected]
>>> gname: Craig Russell
>>> gemail: [email protected]
>>> contact: Craig Russell
>>> cemail: [email protected]
>>> ipodling: " "
>>> email:addr: [email protected]
>>> email:id: "<[email protected]>"
>>> email:name: Craig Russell
>>> email:subject: !binary |-
>>> RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>> svn:mime-type: application/pdf
>>>
>>>
>>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <[email protected]>
>>>> wrote:
>>>>
>>>> This email causes (still pending email) an error sending mail.
>>>>
>>>> I suspect it is because of the em-dash in the subject.
>>>>
>>>> I don’t know how to look at or edit the pending.yml on the server.
>>>>
>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>>> Message-ID:
>>>> <ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com
>>>> <mailto:ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>>
>>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>>> To: [email protected] <mailto:[email protected]>
>>>>
>>>> So, two issues: the pending mail needs to be sent; the bug needs to be
>>>> fixed.
>>>>
>>>> Thanks,
>>>>
>>>> Craig
>>>>
>>>>> Begin forwarded message:
>>>>>
>>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>>> To: [email protected] <mailto:[email protected]>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Cheers,
>>>>> Gosha
>>>>>
>>>>
>>>> Craig L Russell
>>>> Secretary, Apache Software Foundation
>>>> [email protected] <mailto:[email protected]> http://db.apache.org/jdo
>>>> <http://db.apache.org/jdo>
>>>
>>> Craig L Russell
>>> Architect
>>> [email protected]
>>> P.S <mailto:[email protected]>. A good JDO? O, Gasp!
>>>
>>>
>>>
>>>
>>>
>>
>> Craig L Russell
>> Architect
>> [email protected]
>> P.S. A good JDO? O, Gasp!
Craig L Russell
Architect
[email protected]
P.S. A good JDO? O, Gasp!