On Sat, Aug 27, 2016 at 11:23 PM, Craig Russell
<[email protected]> wrote:
> The processing of email::subject seems to be localized to file.cgi ca. 261
>
>           # override subject?
>           if vars.email_subject and !vars.email_subject.empty?
>             if vars.email_subject =~ /^re:\s/i
>               subject vars.email_subject
>             else
>               subject 'Re: ' + vars.email_subject
>             end
>           end
>
> I can’t see where the actual problem is, but is there a way to either;
>
> 1. have whichever component created vars.email_subject recognize UTF-8 
> characters and pass them as characters instead of binary
>
> 2. recognize that this has happened here and replace the subject with an 
> innocuous subject based on the document type.

All of your analysis seems to be on target.

This is from the log:

[Sat Aug 27 18:36:03.233539 2016] [cgi:error] [pid 3570:tid
139833343252224] [client 73.15.26.163:62667] AH01215: _ERROR
#<Encoding::UndefinedConversionError: "\\xE2" from ASCII-8BIT to
UTF-8>, referer:
https://whimsy.apache.org/secretary/workbench/file.cgi

Looking at pending.yml with the interactive ruby shell:

$ irb
irb(main):001:0> require 'yaml'
=> true
irb(main):002:0> pending = YAML.load_file('pending.yml')
=> [{"doctype"=>"icla",
"source"=>"Gosha-Arinich-me-goshakkk.name--icla.pdf",
"realname"=>"Heorhi Arynich", "pubname"=>"Gosha Arinich",
"email"=>"[email protected]", "filename"=>"heorhi-arynich.pdf",
"nname"=>"Gosha Arinich", "nemail"=>"[email protected]",
"iname"=>"Gosha Arinich", "iemail"=>"[email protected]",
"uname"=>"Gosha Arinich", "uemail"=>"[email protected]",
"pname"=>"Gosha Arinich", "pemail"=>"[email protected]",
"memail"=>"[email protected]", "gname"=>"Gosha Arinich",
"gemail"=>"[email protected]", "contact"=>"Gosha Arinich",
"cemail"=>"[email protected]", "ipodling"=>" ",
"email:addr"=>"[email protected]",
"email:id"=>"<ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>",
"email:name"=>"Gosha Arinich", "email:subject"=>"ICLA \xE2\x80\x94
Gosha Arinich aka goshakkk", "svn:mime-type"=>"application/pdf"}]
irb(main):003:0> pending.first['email:subject']
=> "ICLA \xE2\x80\x94 Gosha Arinich aka goshakkk"
irb(main):004:0> pending.first['email:subject'].force_encoding('utf-8')
=> "ICLA — Gosha Arinich aka goshakkk"

Not surprising given the torturous path that the subject goes through
in the current workbench implementation.  A cron job extracts the
subject line from the email using python libraries and puts it into a
svn property associated with the file.  The workbench then uses the
command line to extract that property and parses the output from the
command.  What is surprising is that if there is an error in handling
non-ASCII characters why it hasn't shown up before and more
frequently.  I'm pretty sure that non-ASCII characters have been seen
before, and I'm not sure what is different about this email.

In any case, suggested fixes:

1) add "'vars.email_subject.force_encoding('utf-8') if
vars.email_subject.encoding == Encoding::BINARY" before the inner if
statement.  It should be harmless in cases that currently work, and
should fix this case.  In cases where the data is binary data that
can't be interpreted as utf-8, it will continue to blow up.

2) add 'begin...rescue...end' around the inner if statement.  Note:
you don't need to set subject in the rescue clause as it was set by
the relevant erb file (e.g. icla.erb).  More information on rescue
statements: http://phrogz.net/programmingruby/tut_exceptions.html

These changes should enable you to process the currently pending action.

> Craig

- Sam Ruby

>> On Aug 27, 2016, at 12:11 PM, Craig Russell <[email protected]> wrote:
>>
>> Here’s what happens to the em-dash in whimsy pending.yml:
>>
>> ---
>> - doctype: icla
>>  source: craig-russell-copy.pdf
>>  realname: Craig Russell Emdash
>>  pubname: Craig Russell Emdash
>>  email: [email protected]
>>  filename: craig-russell-emdash.pdf
>>  nname: Craig Russell
>>  nemail: [email protected]
>>  iname: Craig Russell
>>  iemail: [email protected]
>>  uname: Craig Russell
>>  uemail: [email protected]
>>  pname: Craig Russell
>>  pemail: [email protected]
>>  memail: [email protected]
>>  gname: Craig Russell
>>  gemail: [email protected]
>>  contact: Craig Russell
>>  cemail: [email protected]
>>  ipodling: " "
>>  email:addr: [email protected]
>>  email:id: "<[email protected]>"
>>  email:name: Craig Russell
>>  email:subject: !binary |-
>>    RU0gZGFzaCBjYXVzZXMgdHJvdWJsZSDigJQg
>>  svn:mime-type: application/pdf
>>
>>
>>> On Aug 27, 2016, at 11:41 AM, Craig Russell <[email protected]> 
>>> wrote:
>>>
>>> This email causes (still pending email) an error sending mail.
>>>
>>> I suspect it is because of the em-dash in the subject.
>>>
>>> I don’t know how to look at or edit the pending.yml on the server.
>>>
>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>> Date: Sat, 27 Aug 2016 03:03:00 +0300
>>> Message-ID: 
>>> <ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com 
>>> <mailto:ca+ttpjt-+d5_o4uqksv+1dbs_fafwfy4zrmtjspxey48ae3...@mail.gmail.com>>
>>> Subject: =?UTF-8?Q?ICLA_=E2=80=94_Gosha_Arinich_aka_goshakkk?=
>>> To: [email protected] <mailto:[email protected]>
>>>
>>> So, two issues: the pending mail needs to be sent; the bug needs to be 
>>> fixed.
>>>
>>> Thanks,
>>>
>>> Craig
>>>
>>>> Begin forwarded message:
>>>>
>>>> From: Gosha Arinich <[email protected] <mailto:[email protected]>>
>>>> Subject: ICLA — Gosha Arinich aka goshakkk
>>>> Date: August 26, 2016 at 5:03:00 PM PDT
>>>> To: [email protected] <mailto:[email protected]>
>>>>
>>>>
>>>>
>>>> --
>>>> Cheers,
>>>> Gosha
>>>>
>>>
>>> Craig L Russell
>>> Secretary, Apache Software Foundation
>>> [email protected] <mailto:[email protected]> http://db.apache.org/jdo 
>>> <http://db.apache.org/jdo>
>>
>> Craig L Russell
>> Architect
>> [email protected]
>> P.S <mailto:[email protected]>. A good JDO? O, Gasp!
>>
>>
>>
>>
>>
>
> Craig L Russell
> Architect
> [email protected]
> P.S. A good JDO? O, Gasp!
>
>
>
>
>

Reply via email to