https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8307

            Bug ID: 8307
           Summary: Add dedicated class for parsing headers
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Hardware: PC
                OS: Windows 10
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Libraries
          Assignee: dev@spamassassin.apache.org
          Reporter: k...@mxguardian.net
  Target Milestone: Undefined

Created attachment 5995
  --> https://bz.apache.org/SpamAssassin/attachment.cgi?id=5995&action=edit
proposed patch

This is a proposed patch to add a class Mail::SpamAssassin::Header as well as a
subclass Mail::SpamAssassin::Header::ParameterHeader. The impetus for this
change was to add full support for RFC 2231 Parameter Value Continuations and
Character Set Encoding. This typically happens in the "name" parameter of the
Content-Type header or the "filename" parameter of the Content-Disposition
header.

I noticed that these headers were being parsed in multiple places with
different degrees of accuracy. My first objective was to develop a parser class
that could be thoroughly tested with unit tests and reused across core modules
or plugins. The resulting ParameterHeader class can be used to parse any header
that has a main value plus a number of name=value parameters separated by
semi-colons. Primarily these are headers such as Content-Type and
Content-Disposition but I tried to make it flexible enough that it could also
be used to parse headers such as Authentication-Results, Received-SPF, ARC-*,
etc.

The second design goal was to enable plugins to directly access $part->{name}
for the part filename, eliminating the need to parse headers a second time.
Because of this change, I had to deprecate the option
"olemacro_prefer_contentdisposition". The OLEVBMacro plugin now gets the
filename from $part->{name} rather than by parsing headers. I don't know how,
or if, this will affect anyone. olemacro_prefer_contentdisposition defaults to
1 which means to prefer taking the name from Content-Disposition which is
exactly how $part->{name} gets assigned. However, it's currently hard-coded
that way with no option to override it. I'm curious to hear any feedback about
this.

I removed Util::get_part_details and Util::_decode_part_header because they
were only being used by OLEVBMacro (Previously they were part of OLEVBMacro but
got moved into Util at some point)

I know that this project is using "Commit-Then-Review" but I thought this
change  warranted a review before committing. Feedback is appreciated.

-- 
You are receiving this mail because:
You are the assignee for the bug.

Reply via email to