massdosage commented on code in PR #629:
URL:
https://github.com/apache/httpcomponents-client/pull/629#discussion_r2021437718
##########
httpclient5/src/test/java/org/apache/hc/client5/http/psl/TestPublicSuffixMatcher.java:
##########
@@ -284,14 +284,14 @@ void testGetDomainRootPublicSuffixList() {
checkPublicSuffix("shishi.中国", "shishi.中国");
checkPublicSuffix("中国", null);
// Same as above, but punycoded.
- checkPublicSuffix("xn--85x722f.com.cn", "xn--85x722f.com.cn");
- checkPublicSuffix("xn--85x722f.xn--55qx5d.cn",
"xn--85x722f.xn--55qx5d.cn");
- checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn",
"xn--85x722f.xn--55qx5d.cn");
- checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.xn--55qx5d.cn");
+ checkPublicSuffix("xn--85x722f.Com.Cn", "食狮.com.cn");
+ checkPublicSuffix("xn--85x722f.xn--55qx5d.CN", "食狮.公司.cn");
+ checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", "食狮.公司.cn");
+ checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.公司.cn");
checkPublicSuffix("xn--55qx5d.cn", null);
- checkPublicSuffix("xn--85x722f.xn--fiqs8s", "xn--85x722f.xn--fiqs8s");
- checkPublicSuffix("www.xn--85x722f.xn--fiqs8s",
"xn--85x722f.xn--fiqs8s");
- checkPublicSuffix("shishi.xn--fiqs8s", "shishi.xn--fiqs8s");
+ checkPublicSuffix("xn--85x722f.xn--fiqs8s", "食狮.中国");
+ checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", "食狮.中国");
+ checkPublicSuffix("shishi.xn--fiqs8s", "shishi.中国");
Review Comment:
I don't see anywhere in the standard that says if one pass Punycode in one
should expect to get Unicode out. The line you quote above I think comes from
the "[Entry
Specification](https://github.com/publicsuffix/list/wiki/Format#entry-specification)"
which defines the layout of the PSL _file_, not how it behaves. The Algorithm
is defined later on in that page under
https://github.com/publicsuffix/list/wiki/Format#algorithm but it doesn't
specifically call Punycode out.
My understanding of the set of unit tests that they provide is exactly to
avoid inconsistency, if one's implementation behaves the same as theirs and has
the same results as the unit tests then it's correct. We can see they
purposefully added this behaviour a long time ago via this commit
https://github.com/publicsuffix/list/commit/ddc97474bc8d0de6b70de6ac37125a371e6df439#diff-7ff3771a2abbfd9f8dfc636e6fd2ba9ebb72f59f791ed6df380066c66a9f4179R28.
There is a comment there that says "The EffectiveTLDService always gives back
punycoded labels." which is the behaviour we see in the unit tests.
I agree that it's not all very clear but I take the fact that they provide a
set of tests which they run against incoming contributions to be what they
consider "correct" and anything claiming to implement the standard should
behave the same way for the same input.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]