massdosage commented on code in PR #629:
URL:
https://github.com/apache/httpcomponents-client/pull/629#discussion_r2022416194
##########
httpclient5/src/test/java/org/apache/hc/client5/http/psl/TestPublicSuffixMatcher.java:
##########
@@ -284,14 +284,14 @@ void testGetDomainRootPublicSuffixList() {
checkPublicSuffix("shishi.中国", "shishi.中国");
checkPublicSuffix("中国", null);
// Same as above, but punycoded.
- checkPublicSuffix("xn--85x722f.com.cn", "xn--85x722f.com.cn");
- checkPublicSuffix("xn--85x722f.xn--55qx5d.cn",
"xn--85x722f.xn--55qx5d.cn");
- checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn",
"xn--85x722f.xn--55qx5d.cn");
- checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.xn--55qx5d.cn");
+ checkPublicSuffix("xn--85x722f.Com.Cn", "食狮.com.cn");
+ checkPublicSuffix("xn--85x722f.xn--55qx5d.CN", "食狮.公司.cn");
+ checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", "食狮.公司.cn");
+ checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.公司.cn");
checkPublicSuffix("xn--55qx5d.cn", null);
- checkPublicSuffix("xn--85x722f.xn--fiqs8s", "xn--85x722f.xn--fiqs8s");
- checkPublicSuffix("www.xn--85x722f.xn--fiqs8s",
"xn--85x722f.xn--fiqs8s");
- checkPublicSuffix("shishi.xn--fiqs8s", "shishi.xn--fiqs8s");
+ checkPublicSuffix("xn--85x722f.xn--fiqs8s", "食狮.中国");
+ checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", "食狮.中国");
+ checkPublicSuffix("shishi.xn--fiqs8s", "shishi.中国");
Review Comment:
I agree that the spec could be a lot clearer and would benefit from having a
list of examples, but failing that all we have to go on is the test suite that
they use and other implementations of the standard.
Just to be clear, the change from Punycode to Unicode isn't my personal
need, it appears to be how all the libraries which deal with the PSL listed at
https://publicsuffix.org/learn/ work. For comparison I did the following with
`xn--85x722f.com.cn` as input to see if `xn--85x722f.com.cn` would be returned
as per the test across a number of them and they all behaved the same:
Using https://github.com/hamano/regdom4j/:
```
❯ java -jar regdom4j-1.0.3.jar xn--85x722f.com.cn
xn--85x722f.com.cn
```
Using Guava
```
InternetDomainName domain = InternetDomainName.from("xn--85x722f.com.cn");
System.out.println(domain.publicSuffix());
//output com.cn
System.out.println(domain.topDomainUnderRegistrySuffix());
//output xn--85x722f.com.cn
```
Using https://github.com/whois-server-list/public-suffix-list
```
PublicSuffixListFactory factory = new PublicSuffixListFactory();
PublicSuffixList suffixList = factory.build();
System.out.println(suffixList.getPublicSuffix("xn--85x722f.com.cn"));
//output com.cn
System.out.println(suffixList.getRegistrableDomain("xn--85x722f.com.cn"));
//output xn--85x722f.com.cn
```
Using https://pypi.org/project/publicsuffix2/
```
from publicsuffix2 import get_public_suffix
print(get_sld('xn--85x722f.com.cn'))
# output xn--85x722f.com.cn
```
Using https://pypi.org/project/publicsuffixlist/
```
from publicsuffixlist import PublicSuffixList
psl = PublicSuffixList()
print(psl.suffix('xn--85x722f.com.cn'))
# output xn--85x722f.com.cn
```
Using https://pypi.org/project/tldextract/
```
import tldextract
print(tldextract.extract('xn--85x722f.com.cn'))
# output ExtractResult(subdomain='', domain='xn--85x722f', suffix='com.cn',
is_private=False)
```
So I think most users would be surprised if the commons matcher worked
differently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]