massdosage commented on code in PR #629:
URL: 
https://github.com/apache/httpcomponents-client/pull/629#discussion_r2022416194


##########
httpclient5/src/test/java/org/apache/hc/client5/http/psl/TestPublicSuffixMatcher.java:
##########
@@ -284,14 +284,14 @@ void testGetDomainRootPublicSuffixList() {
         checkPublicSuffix("shishi.中国", "shishi.中国");
         checkPublicSuffix("中国", null);
         // Same as above, but punycoded.
-        checkPublicSuffix("xn--85x722f.com.cn", "xn--85x722f.com.cn");
-        checkPublicSuffix("xn--85x722f.xn--55qx5d.cn", 
"xn--85x722f.xn--55qx5d.cn");
-        checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", 
"xn--85x722f.xn--55qx5d.cn");
-        checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.xn--55qx5d.cn");
+        checkPublicSuffix("xn--85x722f.Com.Cn", "食狮.com.cn");
+        checkPublicSuffix("xn--85x722f.xn--55qx5d.CN", "食狮.公司.cn");
+        checkPublicSuffix("www.xn--85x722f.xn--55qx5d.cn", "食狮.公司.cn");
+        checkPublicSuffix("shishi.xn--55qx5d.cn", "shishi.公司.cn");
         checkPublicSuffix("xn--55qx5d.cn", null);
-        checkPublicSuffix("xn--85x722f.xn--fiqs8s", "xn--85x722f.xn--fiqs8s");
-        checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", 
"xn--85x722f.xn--fiqs8s");
-        checkPublicSuffix("shishi.xn--fiqs8s", "shishi.xn--fiqs8s");
+        checkPublicSuffix("xn--85x722f.xn--fiqs8s", "食狮.中国");
+        checkPublicSuffix("www.xn--85x722f.xn--fiqs8s", "食狮.中国");
+        checkPublicSuffix("shishi.xn--fiqs8s", "shishi.中国");

Review Comment:
   I agree that the spec could be a lot clearer and would benefit from having a 
list of examples, but failing that all we have to go on is the test suite that 
they use and other implementations of the standard.
   
   Just to be clear, the change from Punycode to Unicode isn't my personal 
need, it appears to be how all the libraries which deal with the PSL listed at 
https://publicsuffix.org/learn/ work. For comparison I did the following with 
`xn--85x722f.com.cn` as input to see if `xn--85x722f.com.cn` would be returned 
as per the test across a number of them and they all behaved the same:
   
   Using https://github.com/hamano/regdom4j/:
   ```
   ❯ java -jar regdom4j-1.0.3.jar xn--85x722f.com.cn
   xn--85x722f.com.cn
   ```
   
   Using Guava
   ```
   InternetDomainName domain = InternetDomainName.from("xn--85x722f.com.cn");
   System.out.println(domain.publicSuffix());
   //output com.cn        
   System.out.println(domain.topDomainUnderRegistrySuffix());
   //output xn--85x722f.com.cn
   ```
   
   Using https://github.com/whois-server-list/public-suffix-list
   ```
   PublicSuffixListFactory factory = new PublicSuffixListFactory();
   PublicSuffixList suffixList = factory.build();
   System.out.println(suffixList.getPublicSuffix("xn--85x722f.com.cn"));
   //output com.cn
   System.out.println(suffixList.getRegistrableDomain("xn--85x722f.com.cn"));
   //output xn--85x722f.com.cn
   ```
   
   Using https://pypi.org/project/publicsuffix2/
   ```
   from publicsuffix2 import get_public_suffix
   
     print(get_sld('xn--85x722f.com.cn'))
   
   # output xn--85x722f.com.cn
   ```
   
   Using https://pypi.org/project/publicsuffixlist/
   ```
   from publicsuffixlist import PublicSuffixList
   
     psl = PublicSuffixList()
     print(psl.suffix('xn--85x722f.com.cn'))
   
   # output xn--85x722f.com.cn    
   ```
   
   Using https://pypi.org/project/tldextract/
   ```
   import tldextract
       print(tldextract.extract('xn--85x722f.com.cn'))
   # output ExtractResult(subdomain='', domain='xn--85x722f', suffix='com.cn', 
is_private=False)    
   ```
   
   So I think most users would be surprised if the commons matcher worked 
differently.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@hc.apache.org
For additional commands, e-mail: dev-h...@hc.apache.org

Reply via email to