(groovy-website) branch asf-site updated: additional descriptions

paulk Sat, 01 Feb 2025 23:42:05 -0800

This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new a544ea0  additional descriptions
a544ea0 is described below

commit a544ea02ae6bb12b31d1c2fe7cefcd44553514aa
Author: Paul King <[email protected]>
AuthorDate: Sun Feb 2 17:41:51 2025 +1000

    additional descriptions
---
 site/src/site/blog/groovy-text-similarity.adoc | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/site/src/site/blog/groovy-text-similarity.adoc 
b/site/src/site/blog/groovy-text-similarity.adoc
index ac79424..79dfa72 100644
--- a/site/src/site/blog/groovy-text-similarity.adoc
+++ b/site/src/site/blog/groovy-text-similarity.adoc
@@ -78,14 +78,16 @@ Then we'll look at some libraries for phonetic matching:
 
 Then we'll look at some deep learning options for increased semantic matching:
 
-* `org.deeplearning4j:deeplearning4j-nlp` for Glove and ConceptNet models
+* `org.deeplearning4j:deeplearning4j-nlp` for Glove, ConceptNet, and FastText 
models
 * `ai.djl` with Pytorch for a universal-sentence-encoder model and Tensorflow 
with an Angle model
 
 == Simple String Metrics
 
-String metrics provide some sort of measure of the sameness of the characters 
in words (or phrases). These algorithms generally compute similarity or 
distance (inverse similarity).
+String metrics provide some sort of measure of the sameness of the characters 
in words (or phrases).
+These algorithms generally compute similarity or distance (inverse similarity).
 
-There are numerous tutorials that describe various string metric algorithms. 
We won't replicate those tutorials but here is a summary of some common ones:
+There are numerous tutorials that describe various string metric algorithms.
+We won't replicate those tutorials but here is a summary of some common ones:
 
 [cols="2,7"]
 |===
@@ -103,7 +105,6 @@ is a variant that allows transposition of two adjacent 
letters to count as a sin
 characters in a word, or words in a sentence, or sets of `k` consecutive 
characters in a phrase.
 The ratio is the _intersection_ of sets divided by the _union_ of sets.
 `bear` vs `bare` would be 100%, `pair` vs `pear` would be 60%.
-
 | https://en.wikipedia.org/wiki/Hamming_distance[Hamming]
 | Similar to Levenshtein but insertions and deletions aren't allowed.
 Distance between `black` and `block` is 1 (swap `o` for `a`).
@@ -123,11 +124,14 @@ JaroWinkler of `ground` and `rgound` (first two letters 
swapped) is 0.94.
 
 |===
 
-You may be wondering what practical use these algorithms might have.
-Longest commons subsequence is the algorithm behind the popular `diff` tool.
+You may be wondering what practical use these algorithms might have. Here is 
just a few use cases:
+
+* Longest commons subsequence is the algorithm behind the popular `diff` tool
+* Hamming distance is an important metric when designing algorithms for error 
detection, error correction and checksums
+* Levenshtein is used in search engines (like Apache Lucene and Apache Solr)
+for fuzzy matching searches and for spelling correction software
 
-Groovy has in fact a built-in example of a variant of the Levenshtein measure
-it uses for error reporting. Groovy uses a variant known as the 
Damerau-Levenshtein distance.
+Groovy has in fact a built-in example of using the Damerau-Levenshtein 
distance metric.
 This variant counts transposing two adjacent characters within the original 
word as one "edit".
 The Levenshtein distance of `fish` and ifsh` is 2.
 The Damerau-Levenshtein distance of `fish` and ifsh` is 1.

(groovy-website) branch asf-site updated: additional descriptions

Reply via email to