Hi Uchino, I think your request is sensible in general.

Do you intend to require a beginIndex for the codePointCount for String? I 
think a no-arg version suffices.

Also forwarding this to i18n-dev as it is the locale-related list.

P.S. When you reply, make sure you click "Reply all" so all the recipients of 
this current mail gets your reply. Otherwise, the reply is only sent to me, and 
others on the list won't see your reply.

Regards, Chen
________________________________
From: core-libs-dev <core-libs-dev-r...@openjdk.org> on behalf of Uchino 
Tatsunori <tat...@live.jp>
Sent: Monday, August 11, 2025 6:54 AM
To: core-libs-dev@openjdk.org <core-libs-dev@openjdk.org>
Subject: I'd like add no-argument overloads to CharSequence, String, and 
StringBuilder (JDK-8364007)

Dear core-libs developers,

I'd like to add the following overloads:

• Character.codePointCount(CharSequence seq)
• Character.codePointCount(char[] a)
• String.codePointCount(int beginIndex)
• StringBuffer.codePointCount()
• StringBuilder.codePointCount()

and created a patch (https://github.com/openjdk/jdk/pull/26461).

Why:

There have already been similar overloads with the start and end indicies by 
JSR 204 (JDK-4985217). They are thought to have been designed with a priority 
on versatility. They make the specification of indices mandatory, but have the 
following disadvantages:

1. The string expression have to be written twice. Unlike C#, Java has no 
equivalent of extended methods.
2. Unneccesary boundary checks are mixed in.
3. The most userland code tries to calculate the number of code points in the 
entire stirng.
4. Some other languages can count the number of code points in a single 
function without extra arguments (e.g. len() in Python3)

For 3., e.g.:

• VARCHAR in MySQL & PostgreSQL counts the number of characters in the unit of 
code points. e.g. VARCHAR(20) means that the limit is 20 code points, not 20 
UTF-16 code units (20 chars in Java)
• NIST Special Publication 800-63B stiplates that the password length must be 
counted as the unit of code points. (Quote from 
https://pages.nist.gov/800-63-3/sp800-63b.html#-5112-memorized-secret-verifiers 
: "For purposes of the above length requirements, each Unicode code point SHALL 
be counted as a single character.")

I would like to get agreement on these changes and would like to know what I 
have to do outside of GitHub (e.g how to submit CSRs). If you have a GitHub 
account, it would be helpful if you could reply to the PR. If not, you can 
reply directly to this email.

Best Regards,

Tatsunori Uchino
https://github.com/tats-u/

Reply via email to