Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-16 Thread via GitHub


ron-ladin6 closed pull request #737: [TEXT-103] Add support for weighted 
Levenshtein distance
URL: https://github.com/apache/commons-text/pull/737


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-16 Thread via GitHub


chtompki commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4069721772

   That flat doesn't fly at all


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-16 Thread via GitHub


ron-ladin6 commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4069535118

   I used AI tools to help structure the code rather than copying it from a 
textbook. However, if the implementation ends up looking like a direct copy 
from a specific book, I agree that it's better not to include it as-is.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-16 Thread via GitHub


chtompki commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4069496396

   We're making changes on an algorithm that we've cited as having been pulled 
from a book...this is questionable. I wonder if we need a new file all 
toghether.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


ron-ladin6 commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4063708711

   @garydgregory 
   Done! Updated the @since tags to 1.16.0 and added the set prefix to all 
builder methods. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


ron-ladin6 commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4063524420

   .Hi @garydgregory ,
   
   Thanks for the feedback. I've addressed your comments


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


ron-ladin6 commented on PR #737:
URL: https://github.com/apache/commons-text/pull/737#issuecomment-4062884244

   hello @garydgregory ,
   Thank you for the review and the detailed feedback. I appreciate the 
guidance on how to align this with the project's standards.
   I’m on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


garydgregory commented on code in PR #737:
URL: https://github.com/apache/commons-text/pull/737#discussion_r2936650826


##
src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java:
##
@@ -294,48 +397,90 @@ public LevenshteinDistance() {
 }
 
 /**
- * Constructs a new instance. If the threshold is not null, distance 
calculations will be limited to a maximum length. If the threshold is null, the
- * unlimited version of the algorithm will be used.
+ * Constructs a new instance with the given threshold and all operation 
costs set to 1.
  *
- * @param threshold If this is null then distances calculations will not 
be limited. This may not be negative.
+ * 
+ * If the threshold is not null, distance calculations will be limited to 
a maximum length. If
+ * the threshold is null, the unlimited version of the algorithm will be 
used.
+ * 
+ *
+ * @param threshold If this is null then distances calculations will not 
be limited.
+ *  This may not be negative.
  */
 public LevenshteinDistance(final Integer threshold) {
+this(threshold, DEFAULT_INSERT_COST, DEFAULT_DELETE_COST, 
DEFAULT_REPLACE_COST);
+}
+
+/**
+ * Constructs a new instance with the given threshold and custom operation 
costs.
+ *
+ * 
+ * If the threshold is not null, distance calculations will be limited to 
a maximum value.
+ * If the threshold is null, the unlimited version of the algorithm will 
be used.
+ * 
+ *
+ * 
+ * All cost parameters must be non-negative integers. Passing 0 for a cost 
makes that
+ * operation free; passing values greater than 1 makes it more expensive 
relative to
+ * the other operations.
+ * 
+ *
+ * @param threshold   If this is null then distance calculations will not 
be limited.
+ *This may not be negative.
+ * @param insertCost  the cost of inserting a character, must not be 
negative.
+ * @param deleteCost  the cost of deleting a character, must not be 
negative.
+ * @param replaceCost the cost of replacing (substituting) a character, 
must not be negative.
+ * @throws IllegalArgumentException if threshold is negative, or any cost 
is negative.
+ * @since 1.13.0
+ */
+public LevenshteinDistance(final Integer threshold, final int insertCost, 
final int deleteCost,

Review Comment:
   Use a builder instead, this will avoid constructor inflation and 
deprecations in the future. You should end upt with a private constructor that 
takes the Builder as its input. For an example in this component, see 
`RandomStringGenerator` but here the builder wouldn't need to extend the 
deprecated Builder interface.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


garydgregory commented on code in PR #737:
URL: https://github.com/apache/commons-text/pull/737#discussion_r2936650826


##
src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java:
##
@@ -294,48 +397,90 @@ public LevenshteinDistance() {
 }
 
 /**
- * Constructs a new instance. If the threshold is not null, distance 
calculations will be limited to a maximum length. If the threshold is null, the
- * unlimited version of the algorithm will be used.
+ * Constructs a new instance with the given threshold and all operation 
costs set to 1.
  *
- * @param threshold If this is null then distances calculations will not 
be limited. This may not be negative.
+ * 
+ * If the threshold is not null, distance calculations will be limited to 
a maximum length. If
+ * the threshold is null, the unlimited version of the algorithm will be 
used.
+ * 
+ *
+ * @param threshold If this is null then distances calculations will not 
be limited.
+ *  This may not be negative.
  */
 public LevenshteinDistance(final Integer threshold) {
+this(threshold, DEFAULT_INSERT_COST, DEFAULT_DELETE_COST, 
DEFAULT_REPLACE_COST);
+}
+
+/**
+ * Constructs a new instance with the given threshold and custom operation 
costs.
+ *
+ * 
+ * If the threshold is not null, distance calculations will be limited to 
a maximum value.
+ * If the threshold is null, the unlimited version of the algorithm will 
be used.
+ * 
+ *
+ * 
+ * All cost parameters must be non-negative integers. Passing 0 for a cost 
makes that
+ * operation free; passing values greater than 1 makes it more expensive 
relative to
+ * the other operations.
+ * 
+ *
+ * @param threshold   If this is null then distance calculations will not 
be limited.
+ *This may not be negative.
+ * @param insertCost  the cost of inserting a character, must not be 
negative.
+ * @param deleteCost  the cost of deleting a character, must not be 
negative.
+ * @param replaceCost the cost of replacing (substituting) a character, 
must not be negative.
+ * @throws IllegalArgumentException if threshold is negative, or any cost 
is negative.
+ * @since 1.13.0
+ */
+public LevenshteinDistance(final Integer threshold, final int insertCost, 
final int deleteCost,

Review Comment:
   Use a builder instead, this will avoid constructor inflation and 
deprecations in the future. You should end upt with a private constructor that 
takes the Builder as its input.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



Re: [PR] [TEXT-103] Add support for weighted Levenshtein distance [commons-text]

2026-03-15 Thread via GitHub


garydgregory commented on code in PR #737:
URL: https://github.com/apache/commons-text/pull/737#discussion_r2936652131


##
src/test/java/org/apache/commons/text/similarity/LevenshteinDistanceTest.java:
##
@@ -6,7 +6,7 @@
  * (the "License"); you may not use this file except in compliance with
  * the License.  You may obtain a copy of the License at
  *
- *  https://www.apache.org/licenses/LICENSE-2.0

Review Comment:
   Do NOT edit license headers.
   



##
src/test/java/org/apache/commons/text/similarity/LevenshteinDistanceTest.java:
##
@@ -181,4 +181,58 @@ void testGetThresholdDirectlyAfterObjectInstantiation() {
 assertNull(LevenshteinDistance.getDefaultInstance().getThreshold());
 }
 
-}
+// 
-
+// New Weighted Levenshtein Distance Tests
+// 
-
+
+@Test
+void testConstructorWithNegativeCosts() {
+assertThrows(IllegalArgumentException.class, () -> new 
LevenshteinDistance(null, -1, 1, 1));
+assertThrows(IllegalArgumentException.class, () -> new 
LevenshteinDistance(null, 1, -1, 1));
+assertThrows(IllegalArgumentException.class, () -> new 
LevenshteinDistance(null, 1, 1, -1));
+}
+
+@Test
+void testGetLevenshteinDistance_WeightedUnlimited() {
+// Substitution is very expensive (10) vs Insert/Delete (1 each)
+final LevenshteinDistance dist = new LevenshteinDistance(null, 1, 1, 
10);
+// 'a' -> 'b' should choose delete 'a' (1) and insert 'b' (1) = 
distance 2,
+// instead of replace (10).
+assertEquals(2, dist.apply("a", "b"));
+
+// All operations are free (0)
+final LevenshteinDistance freeDist = new LevenshteinDistance(null, 0, 
0, 0);
+assertEquals(0, freeDist.apply("abc", "def"));
+
+// Asymmetric costs: Insert=10, Delete=1, Replace=100
+final LevenshteinDistance asymmetric = new LevenshteinDistance(null, 
10, 1, 100);
+assertEquals(1, asymmetric.apply("a", "")); // Delete 'a' = 1
+assertEquals(10, asymmetric.apply("", "a")); // Insert 'a' = 10
+}
+
+@Test
+void testGetLevenshteinDistance_WeightedThreshold() {
+// Distance is 2 (via delete/insert), threshold is 5 -> result 2
+final LevenshteinDistance weighted = new LevenshteinDistance(5, 1, 1, 
10);
+assertEquals(2, weighted.apply("a", "b"));
+
+// Distance is 2, threshold is 1 -> result -1
+final LevenshteinDistance strict = new LevenshteinDistance(1, 1, 1, 
10);
+assertEquals(-1, strict.apply("a", "b"));
+
+// Empty strings with weighted threshold
+assertEquals(0, new LevenshteinDistance(5, 2, 2, 2).apply("", ""));
+assertEquals(4, new LevenshteinDistance(5, 2, 2, 2).apply("aa", ""));
+assertEquals(-1, new LevenshteinDistance(1, 2, 2, 2).apply("aa", ""));
+}
+
+@Test
+void testWeightedAccessors() {
+final LevenshteinDistance dist = new LevenshteinDistance(10, 2, 3, 4);
+assertEquals(10, dist.getThreshold());
+assertEquals(2, dist.getInsertCost());
+assertEquals(3, dist.getDeleteCost());
+assertEquals(4, dist.getReplaceCost());
+}
+
+}

Review Comment:
   Don't delete the empty line at the end of files.
   



##
src/test/java/org/apache/commons/text/similarity/LevenshteinDistanceTest.java:
##
@@ -181,4 +181,58 @@ void testGetThresholdDirectlyAfterObjectInstantiation() {
 assertNull(LevenshteinDistance.getDefaultInstance().getThreshold());
 }
 
-}
+// 
-

Review Comment:
   Remove these // comments please, they don't help maintain the code.
   Do use Javadoc if you want to call something out.



##
src/main/java/org/apache/commons/text/similarity/LevenshteinDistance.java:
##
@@ -294,48 +397,90 @@ public LevenshteinDistance() {
 }
 
 /**
- * Constructs a new instance. If the threshold is not null, distance 
calculations will be limited to a maximum length. If the threshold is null, the
- * unlimited version of the algorithm will be used.
+ * Constructs a new instance with the given threshold and all operation 
costs set to 1.
  *
- * @param threshold If this is null then distances calculations will not 
be limited. This may not be negative.
+ * 
+ * If the threshold is not null, distance calculations will be limited to 
a maximum length. If
+ * the threshold is null, the unlimited version of the algorithm will be 
used.
+ * 
+ *
+ * @param threshold If this is null then distances calculations will not 
be limited.
+ *  This may not be negative.
  */
 public LevenshteinDistance(final Integer threshold) {
+this(threshold, DEFAULT_INSERT_COST, DEFAULT_DELETE_COS