On 7/22/20 1:43 PM, naoto.s...@oracle.com wrote:
Thanks Roger,
Ah, I just saw your email just after I sent mine!
They probably saw each other crossing and said hi on the way to inboxes ;-)
On 7/22/20 1:38 PM, Roger Riggs wrote:
Hi Naoto,
Looks fine. (with Joe's suggestion)
On 7/22/20 4:20 PM, Joe Wang wrote:
Hi Naoto,
The change looks good to me. "supLower" is indeed super slow :-)
The only minor comment I have is that the compareCodePointCI method
performs toUpperCase unconditionally. That's not a problem for the
regular case, where a check on cp1 == cp2 (line 337) is done prior
to the method call. But for the sup case (starting at line 341), the
method is called unconditionally while in webrev.04 there was a
check "cp1 != cp2". One option to fix it is to include the "cp1 !=
cp2" check in the method compareCodePointCI, then cp1 == cp2 at line
337 can be omitted.
I would have added to line 353 the same cp1 == cp2 check as 337 to
avoid the method call
unless it was needed.
As in the previous email, cp1 != cp2 at that point, since either high
or low surrogates differ.
Make sense. The webrev looks good to me.
-Joe
Naoto
Roger
Regards,
Joe
On 7/22/20 10:23 AM, naoto.s...@oracle.com wrote:
Hi,
I revised the fix again, based on further suggestions:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.05/
Changes from v.04 are (all in StringUTF16.java):
- The short cut now does case insensitive comparison that makes the
fix closer to the previous implementation (for BMP characters).
- Changed the bit operation to negating for detecting needed index
increment.
- Method name is changed to better reflect what it is doing, with
more descriptive comments.
Here is the benchmark results:
before:
Benchmark Mode Cnt Score Error
Units
StringCompareToIgnoreCase.lower avgt 25 49.960 ? 1.923
ns/op
StringCompareToIgnoreCase.supLower avgt 25 21.003 ? 0.354
ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 30.863 ? 4.529
ns/op
StringCompareToIgnoreCase.upperLower avgt 25 15.417 ? 1.046
ns/op
after:
Benchmark Mode Cnt Score Error
Units
StringCompareToIgnoreCase.lower avgt 25 46.857 ? 0.524
ns/op
StringCompareToIgnoreCase.supLower avgt 25 148.688 ? 6.546
ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 37.160 ? 0.259
ns/op
StringCompareToIgnoreCase.upperLower avgt 25 15.126 ? 0.338
ns/op
Now non-supplementary operations ("lower" and "upperLower") are on
par with the "before" result (I am not quite sure why the "after"
results are somewhat faster though). For supplementary test cases,
"supLower" is very slow. The reason is two fold; one is because
"before" one exits at the very first character (which I am
addressing here) while "after" continues to compare to the last
characters, the other reason is the test suffers from the change
where supplementary cases double the case insensitivity checks
(compared to the "after" result just below). Also "supUpperLower"
gets slower for the same reason. These are expected results for
supplementary comparisons (as we discussed).
Naoto
On 7/17/20 4:36 PM, naoto.s...@oracle.com wrote:
Hi,
Based on the suggestions, I modified the fix as follows:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.01/
Changes from the initial revision are:
- Shared the implementation between compareToCI() and
regionMatchesCI()
- Enabled immediate short cut if two code points match.
- Created a simple JMH benchmark. Here is the scores before and
after the change:
before:
Benchmark Mode Cnt Score Error
Units
StringCompareToIgnoreCase.lower avgt 25 53.764 ? 2.811
ns/op
StringCompareToIgnoreCase.supLower avgt 25 24.211 ? 1.135
ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 30.595 ? 1.344
ns/op
StringCompareToIgnoreCase.upperLower avgt 25 18.859 ? 1.499
ns/op
after:
Benchmark Mode Cnt Score Error
Units
StringCompareToIgnoreCase.lower avgt 25 58.354 ? 4.603
ns/op
StringCompareToIgnoreCase.supLower avgt 25 57.975 ? 5.672
ns/op
StringCompareToIgnoreCase.supUpperLower avgt 25 23.912 ? 0.965
ns/op
StringCompareToIgnoreCase.upperLower avgt 25 17.744 ? 0.272
ns/op
Here, "sup" means all supplementary characters, BMP otherwise.
"lower" means each character requires upper->lower casemap.
"upperLower" means all characters are the same, except the last
character which requires casemap.
I think the result is reasonable, considering surrogates check are
now mandatory. I have tried Roger's suggestion to use
Arrays.mismatch() but it did not seem to benefit here. In fact,
the performance degraded partly because I implemented the short
cut, and possibly for the overhead of extra checks.
Naoto
On 7/15/20 9:00 AM, naoto.s...@oracle.com wrote:
Hello,
Please review the fix to the following issues:
https://bugs.openjdk.java.net/browse/JDK-8248655
https://bugs.openjdk.java.net/browse/JDK-8248434
The proposed changeset and its CSR are located at:
https://cr.openjdk.java.net/~naoto/8248655.8248434/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8248664
A bug was filed against SimpleDateFormat (8248434) where
case-insensitive date format/parse failed in some of the new
locales in JDK15. The root cause was that case-insensitive
String.regionMatches() method did not work with supplementary
characters. The problem is that the method's spec does not expect
case mappings of supplementary characters, possibly because it
was overlooked in the first place, JSR 204 - "Unicode
Supplementary Character support". Similar behavior is observed in
other two case-insensitive methods, i.e., compareToIgnoreCase()
and equalsIgnoreCase().
The fix is straightforward to compare strings by code point
basis, instead of code unit (16bit "char") basis. Technically
this change will introduce a backward incompatibility, but I
believe it is an incompatibility to wrong behavior, not true to
the meaning of those methods' expectations.
Naoto