-=| gregor herrmann, 13.08.2020 17:20:32 +0200 |=- > Package: pkg-perl-tools > Version: 0.62 > Severity: minor > > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > In #968362 I've used the ellipsis UTF-8 character to denote an elison > like: '[…]'. > > After running `dpt forward 968362' the nice 3 dots end up in the GitHub > issue at https://github.com/maxmind/MaxMind-DB-Writer-perl/issues/112 > as: '[â�¦]' > > Looks like there's some (UTF-8 or other) encoding directive missing > somewhere …
I think the problem is in giving Net::GitHub::V3 a binary string instead of unicode string. Here's why: Net::GitHub::V3 uses JSON::MaybeXS to construct the HTTP request payload. JSON::MaybeXS prefers Cpanel::JSON::XS which encodes the input in UTF-8. I have tried this is a terminal that works with UTF-8: # working $ perl -MCpanel::JSON::XS -MEncode \ -wE'my $test="tæst"; say $test, encode_json({test=>decode_utf8($test)})' tæst{"test":"tæst"} # not working $ perl -MCpanel::JSON::XS -MEncode \ -wE'my $test="tæst"; say $test, encode_json({test=>$test})' tæst{"test":"tæst"} Note the STDOUT handle has no encoding layer and there is no 'use utf8' in effect -- the input string is binary. The difference is that the not working case passes a binary string. My guess is that encode_json() tries to encode its arguments, basically invoking Encode's encode_utf8() on them, resulting in double encoding. Something like: $ perl -MEncode -wE'my $test="tæst"; say $test, encode_utf8($test)' tæsttæst The working case works because encode_json() is given an unicode string, which produces an utf8-encoded binary string when given to encode_utf8(). So my proposal is to make sure that all github related methods decode message contents before giving them to Net::GitHub::V3. Perhaps handling that in Message::edit_message:93 would be enough, but that is also used with non-github backends and would need testing. -- Damyan