Matthew Donoughe created MNG-8240:
-------------------------------------
Summary: ComparableVersion incorrectly handles leading Unicode Nd
class zeros
Key: MNG-8240
URL: https://issues.apache.org/jira/browse/MNG-8240
Project: Maven
Issue Type: Bug
Environment: openjdk version "1.8.0_412"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_412-b08)
OpenJDK 64-Bit Server VM (Temurin)(build 25.412-b08, mixed mode)
Reporter: Matthew Donoughe
ComparableVersion supports positive decimal numbers of unlimited size. As an
optimization, the size of the number (in UCS-2 codepoints) determines whether
the value should be converted into an int or a long or a BigDecimal. As another
optimization, because the size of the value affects the data type, an int is
always smaller than a long which is always smaller than a BigDecimal. Leading
0s are removed to avoid the case where 00000000000000000001 > 2.
However, it's specifically '0', DIGIT ZERO, 0x0030, that is being removed. The
code that segments the version string into items uses
[Character.isDigit|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#isDigit(char)],
which uses
[Character.getType|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#getType(char)]
to check for
[DECIMAL_DIGIT_NUMBER|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#DECIMAL_DIGIT_NUMBER]
corresponding to the [Unicode Nd
class|https://www.fileformat.info/info/unicode/category/Nd/list.htm], and
parsing into a number eventually uses
[Character.digit|https://docs.oracle.com/en/java/javase/22/docs/api/java.base/java/lang/Character.html#digit(char,int)]
which likewise supports Unicode Nd class digits. This leads to the following
case:
{noformat}
java -jar
~/.m2/repository/org/apache/maven/maven-artifact/3.9.4/maven-artifact-3.9.4.jar
०००००००००००००००००००1 0000000000000000000000002
Display parameters as parsed by Maven (in canonical form and as a list of
tokens) and comparison result:
1. ०००००००००००००००००००1 -> 1; tokens: [1]
०००००००००००००००००००1 > 0000000000000000000000002
2. 0000000000000000000000002 -> 2; tokens: [2]{noformat}
A 1 with 19 leading zeros is parsed as a BigDecimal, and a 2 with 24 leading
zeros is parsed as an int, so therefore 1 > 2. However, the canonicalization
still works correctly, so if you canonicalize the versions before comparing
them you get 1 < 2 as expected. I don't know if that's better or worse because
it can lead to the order being unstable.
I guess the easy solution is to use Character.digit to check for int 0 instead
of char '0'.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)