On Wed, Oct 24, 2012 at 07:10:36PM +0200, Krzysztof Mazur wrote:
> > > - if ($broken_encoding{$t} && !is_rfc2047_quoted($subject)) {
> > > + if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) &&
> > > + ($subject =~ /[^[:ascii:]]/)) {
> >
> > Is that test sufficient? We would also need to encode if it has rfc2047
> > specials, no?
>
> For Subject this should be sufficient. According to RFC822 after
> "Subject:" we have "text" token,
> [...]
> So the problem only exists for broken RFC2047-like texts, but I think
> it's ok to just pass such subjects, in most cases the Subject comes
> from already formatted patch file. I think that we just want to fix Subjects
> without specified encoding here.
Right, but I was specifically worried about raw "=?", which is only an
issue due to rfc2047 itself.
However, reading the patch again, we are already checking for that with
is_rfc2047_quoted. It might miss the case where we have =? but not the
rest of a valid encoded word, but any compliant parser should recognize
that and leave it be.
So I think your original patch is actually correct.
> I think we can go even further, we can just add quote_subject(),
> which performs this test and calls quote_rfc2047() if necessary.
> I'm sending bellow patch that does that.
Yeah, it would still be nice to keep the logic in one place.
> diff --git a/git-send-email.perl b/git-send-email.perl
> index efeae4c..e9aec8d 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -657,9 +657,7 @@ EOT
> $initial_subject = $1;
> my $subject = $initial_subject;
> $_ = "Subject: " .
> - ($subject =~ /[^[:ascii:]]/ ?
> - quote_rfc2047($subject, $compose_encoding) :
> - $subject) .
> + quote_subject($subject, $compose_encoding) .
Hrm. Isn't this one technically a regression if the $subject contains
encoded words? IOW, in this case we feed quote_subject a known-raw
header; any rfc2047 in it would want to be encoded to be preserved.
But in this case:
> @@ -1327,9 +1341,8 @@ foreach my $t (@files) {
> $body_encoding = $auto_8bit_encoding;
> }
>
> - if ($broken_encoding{$t} && !is_rfc2047_quoted($subject) &&
> - ($subject =~ /[^[:ascii:]]/)) {
> - $subject = quote_rfc2047($subject, $auto_8bit_encoding);
> + if ($broken_encoding{$t}) {
> + $subject = quote_subject($subject, $auto_8bit_encoding);
> }
We have a possibly already-encoded header, and we would want to avoid
double-encoding it.
In the first case, the "wants quoting" logic should be:
is_rfc2047_quoted($subject) || /[^[:ascii:]]/
and in the latter case it would be:
!is_rfc2047_quoted($subject) && /^[:ascii:]]/
-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html