[Bug 8272] A HREF with UTF-8 host name invisible to SA

bugzilla-daemon Sun, 04 Aug 2024 14:20:10 -0700

https://bz.apache.org/SpamAssassin/show_bug.cgi?id=8272


--- Comment #12 from Sidney Markowitz <sid...@sidney.com> ---
Before I submit a patch to Node.pm I want to make sure I clear up two questions
I have about the existing code in sub _normalize so I'm not patching what I
don't understand:

In the block
  elsif ($charset_declared =~ /^UTF[ -]?16/i) {
if it is successful at decoding, it has the line
  return $_[0]  if !$return_decoded;
throwing away the result of the decoding.

I understand doing that for a declared charset of UTF-8 or for any of the
ASCII-extension charsets when there are only 7-bit characters in the raw
string, as !$return_decoded means that the return is a valid UTF-8 byte string
and with those charsets $_[0] already is valid UTF-8 if it is all 7-bit. How is
return $_[0] correct for UTF-16? Shouldn't it be
   return utf8::encode($rv)  if !$return_decoded;
like is done at the very end of the sub?

Second question, is there a reason that in some places "1|8" is used to hard
code the flag values instead of Encode::FB_CROAK|Encode::LEAVE_SRC ? I would
change those as a matter of style if I'm going to patch that code anyway. I see
that require Encode is in an eval as if it is optional, but there are uses of
Encode:: in the code that are not optional. And Encode is a Standard Module,
bundled with perl, so should it be in a use Encode instead of a require in a
BEGIN block? Only Encode::Detect::Detector is properly handled as an optional
module.

-- 
You are receiving this mail because:
You are the assignee for the bug.

[Bug 8272] A HREF with UTF-8 host name invisible to SA

Reply via email to