This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new f6cf0c1  a few more images
f6cf0c1 is described below

commit f6cf0c1e6c10a2ade9c308354635815c5a1506fa
Author: Paul King <[email protected]>
AuthorDate: Tue Feb 18 13:16:37 2025 +1000

    a few more images
---
 site/src/site/blog/groovy-text-similarity.adoc | 257 ++++++++++++++++---------
 site/src/site/blog/img/gameBubble.png          | Bin 0 -> 157410 bytes
 site/src/site/blog/img/semantle.png            | Bin 0 -> 74004 bytes
 site/src/site/blog/img/wordle.png              | Bin 0 -> 95851 bytes
 4 files changed, 171 insertions(+), 86 deletions(-)

diff --git a/site/src/site/blog/groovy-text-similarity.adoc 
b/site/src/site/blog/groovy-text-similarity.adoc
index 9f31134..21e8e2c 100644
--- a/site/src/site/blog/groovy-text-similarity.adoc
+++ b/site/src/site/blog/groovy-text-similarity.adoc
@@ -612,7 +612,8 @@ hippo|hippopotamus  50%            40%            40%
 == Going Deeper
 
 Rather than finding similarity based on a word's individual letters, or 
phonetic mappings,
-_machine learning_/_deep learning_ tries to relate words with similar semantic 
meaning. The approach maps each word (or phrase) in n-dimensional space (called 
a _word vector_ or _word embedding_).
+_machine learning_ and _deep learning_ try to relate words with similar 
semantic meaning.
+The approach maps each word (or phrase) in n-dimensional space (called a _word 
vector_ or _word embedding_).
 Related words tend to cluster in similar positions within that space.
 Typically rule-based, statistical, or neural-based approaches are used to 
perform the embedding
 and distance measures like 
https://en.wikipedia.org/wiki/Cosine_similarity[cosine similarity]
@@ -653,16 +654,18 @@ and can then call methods like `similarity` and 
`wordsNearest` as shown here:
 
 [source,groovy]
 ----
-var path = 
Paths.get(ConceptNet.classLoader.getResource('glove-wiki-gigaword-300.bin').toURI()).toFile()
+var modelName = 'glove-wiki-gigaword-300.bin'
+var path = 
Paths.get(ConceptNet.classLoader.getResource(modelName).toURI()).toFile()
 Word2Vec model = WordVectorSerializer.readWord2VecModel(path)
 String[] words = ['bull', 'calf', 'bovine', 'cattle', 'livestock', 'horse']
 println """GloVe similarity to cow: ${
     words
         .collectEntries { [it, model.similarity('cow', it)] }
         .sort { -it.value }
-        .collectValues{ sprintf '%4.2f', it }
-}"""
-println "Nearest words in vocab: " + model.wordsNearest('cow', 4)
+        .collectValues('%4.2f'::formatted)
+}
+Nearest words in vocab: ${model.wordsNearest('cow', 4)}
+"""
 ----
 
 Which gives this output:
@@ -674,17 +677,128 @@ Nearest words in vocab: [cows, mad, bovine, cattle]
 
 === FastText
 
-We can swap to a https://fasttext.cc/[FastText] model. We used [this model] 
which has
+We can swap to a https://fasttext.cc/[FastText] model, simply by switching to 
that model:
+
+[source,groovy]
+----
+var modelName = 'fasttext-wiki-news-subwords-300.bin'
+----
+
+We used https://huggingface.co/fse/fasttext-wiki-news-subwords-300[this model] 
which has
 1 million word vectors trained on Wikipedia 2017, UMBC webbase corpus and 
statmt.org news dataset (16B tokens).
 
-It has this output:
+When run with the FastText model, the script has this output:
 
 ----
 FastText similarity to cow: [bovine:0.72, cattle:0.70, calf:0.67, bull:0.67, 
livestock:0.61, horse:0.60]
 Nearest words in vocab: [cows, goat, pig, bovine]
 ----
 
-Using DJL with PyTorch and the Angle model:
+=== ConceptNet
+
+Similarly, we can switch to a ConceptNet model through a change of the model 
name.
+This model also supports multiple languages and incorporates the language used 
into terms, e.g. for English,
+we use "/c/en/cow" instead of "cow":
+
+[source,groovy]
+----
+var modelName = 'conceptnet-numberbatch-17-06-300.bin'
+...
+println """ConceptNet similarity to /c/en/cow: ${
+    words
+        .collectEntries { ["/c/en/$it", model.similarity('/c/en/cow', 
"/c/en/$it")] }
+        .sort { -it.value }
+        .collectValues('%4.2f'::formatted)
+}
+Nearest words in vocab: ${model.wordsNearest('/c/en/cow', 4)}
+"""
+----
+
+----
+ConceptNet similarity to /c/en/cow: [/c/en/bovine:0.77, /c/en/cattle:0.77, 
/c/en/livestock:0.63, /c/en/bull:0.54, /c/en/calf:0.53, /c/en/horse:0.50]
+Nearest words in vocab: [/c/ast/vaca, /c/be/карова, /c/ur/گای, /c/gv/booa]
+----
+
+There are benefits and costs with using a multilingual model. The model itself 
is bigger and takes longer to load.
+It will typically need more memory to use, but it does allow us to consider 
multilingual options if we wanted to
+as the following results show:
+
+----
+Algorithm            conceptnet
+
+               /c/fr/vache    █████████▏
+               /c/de/kuh      █████████▏
+/c/en/cow      /c/en/bovine   ███████▏
+               /c/fr/bovin    ███████▏
+               /c/en/bull     █████▏
+
+               /c/fr/taureau  █████████▏
+               /c/en/cow      █████▏
+/c/en/bull     /c/fr/vache    █████▏
+               /c/de/kuh      █████▏
+               /c/fr/bovin    █████▏
+
+               /c/de/kuh      █████▏
+               /c/en/cow      █████▏
+/c/en/calf     /c/fr/vache    █████▏
+               /c/en/bovine   █████▏
+               /c/fr/bovin    █████▏
+
+               /c/fr/bovin    █████████▏
+               /c/en/cow      ███████▏
+/c/en/bovine   /c/de/kuh      ███████▏
+               /c/fr/vache    ███████▏
+               /c/en/calf     █████▏
+
+               /c/en/bovine   █████████▏
+               /c/fr/vache    ███████▏
+/c/fr/bovin    /c/de/kuh      ███████▏
+               /c/en/cow      ███████▏
+               /c/fr/taureau  █████▏
+
+               /c/en/cow      █████████▏
+               /c/de/kuh      █████████▏
+/c/fr/vache    /c/fr/bovin    ███████▏
+               /c/en/bovine   ███████▏
+               /c/fr/taureau  █████▏
+
+               /c/en/bull     █████████▏
+               /c/fr/bovin    █████▏
+/c/fr/taureau  /c/fr/vache    █████▏
+               /c/en/cow      █████▏
+               /c/de/kuh      █████▏
+
+               /c/en/cow      █████████▏
+               /c/fr/vache    █████████▏
+/c/de/kuh      /c/fr/bovin    ███████▏
+               /c/en/bovine   ███████▏
+               /c/en/calf     █████▏
+
+               /c/en/cat      ████████▏
+               /c/de/katze    ████████▏
+/c/en/kitten   /c/en/bull     ██▏
+               /c/en/cow      █▏
+               /c/de/kuh      █▏
+
+               /c/de/katze    █████████▏
+               /c/en/kitten   ████████▏
+/c/en/cat      /c/en/bull     ██▏
+               /c/en/cow      ██▏
+               /c/fr/taureau  █▏
+
+               /c/en/cat      █████████▏
+               /c/en/kitten   ████████▏
+/c/de/katze    /c/en/bull     ██▏
+               /c/de/kuh      ██▏
+               /c/fr/taureau  ██▏
+----
+
+We won't use this feature for our game, but it would be a great thing to add
+if you speak multiple languages or if you were learning a new language.
+
+=== AnglE
+
+Using DJL with PyTorch and the AnglE model:
 
 ----
     cow
@@ -737,6 +851,8 @@ One two three (0.43)
 bovine (0.39)
 ----
 
+=== UAE
+
 Using DJL with Tensorflow and the UAE model:
 
 ----
@@ -794,75 +910,8 @@ image:img/AnimalSemanticSimilarity.png[]
 
 image:img/AnimalSemanticMeaningPcaBubblePlot.png[]
 
-----
-Algorithm            conceptnet
-
-               /c/fr/vache    █████████▏
-               /c/de/kuh      █████████▏
-/c/en/cow      /c/en/bovine   ███████▏
-               /c/fr/bovin    ███████▏
-               /c/en/bull     █████▏
-
-               /c/fr/taureau  █████████▏
-               /c/en/cow      █████▏
-/c/en/bull     /c/fr/vache    █████▏
-               /c/de/kuh      █████▏
-               /c/fr/bovin    █████▏
-
-               /c/de/kuh      █████▏
-               /c/en/cow      █████▏
-/c/en/calf     /c/fr/vache    █████▏
-               /c/en/bovine   █████▏
-               /c/fr/bovin    █████▏
-
-               /c/fr/bovin    █████████▏
-               /c/en/cow      ███████▏
-/c/en/bovine   /c/de/kuh      ███████▏
-               /c/fr/vache    ███████▏
-               /c/en/calf     █████▏
-
-               /c/en/bovine   █████████▏
-               /c/fr/vache    ███████▏
-/c/fr/bovin    /c/de/kuh      ███████▏
-               /c/en/cow      ███████▏
-               /c/fr/taureau  █████▏
 
-               /c/en/cow      █████████▏
-               /c/de/kuh      █████████▏
-/c/fr/vache    /c/fr/bovin    ███████▏
-               /c/en/bovine   ███████▏
-               /c/fr/taureau  █████▏
-
-               /c/en/bull     █████████▏
-               /c/fr/bovin    █████▏
-/c/fr/taureau  /c/fr/vache    █████▏
-               /c/en/cow      █████▏
-               /c/de/kuh      █████▏
-
-               /c/en/cow      █████████▏
-               /c/fr/vache    █████████▏
-/c/de/kuh      /c/fr/bovin    ███████▏
-               /c/en/bovine   ███████▏
-               /c/en/calf     █████▏
-
-               /c/en/cat      ████████▏
-               /c/de/katze    ████████▏
-/c/en/kitten   /c/en/bull     ██▏
-               /c/en/cow      █▏
-               /c/de/kuh      █▏
-
-               /c/de/katze    █████████▏
-               /c/en/kitten   ████████▏
-/c/en/cat      /c/en/bull     ██▏
-               /c/en/cow      ██▏
-               /c/fr/taureau  █▏
-
-               /c/en/cat      █████████▏
-               /c/en/kitten   ████████▏
-/c/de/katze    /c/en/bull     ██▏
-               /c/de/kuh      ██▏
-               /c/fr/taureau  ██▏
-----
+=== Comparing Algorithm Choices
 
 ----
 Algorithm       angle                use                  conceptnet           
glove                fasttext
@@ -1032,7 +1081,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 4): budding
 LongestCommonSubsequence       6
 Levenshtein                    Distance: 1, Insert: 0, Delete: 0, Substitute: 1
-Jaccard                        71%  (5/7)
+Jaccard                        71%
 JaroWinkler                    PREFIX 90% / SUFFIX 96%
 Phonetic                       Metaphone=BTNK 79% / Soundex=B352 75%
 Meaning                        Angle 52% / Use 35% / ConceptNet 2% / GloVe 4% 
/ FastText 25%
@@ -1048,7 +1097,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 5): pudding
 LongestCommonSubsequence       7
 Levenshtein                    Distance: 0, Insert: 0, Delete: 0, Substitute: 0
-Jaccard                        100%  (6/6)
+Jaccard                        100%
 JaroWinkler                    PREFIX 100% / SUFFIX 100%
 Phonetic                       Metaphone=PTNK 100% / Soundex=P352 100%
 Meaning                        Angle 100% / Use 100% / ConceptNet 100% / GloVe 
100% / FastText 100%
@@ -1063,7 +1112,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 1): bail
 LongestCommonSubsequence       1
 Levenshtein                    Distance: 7, Insert: 4, Delete: 0, Substitute: 3
-Jaccard                        22%  (2/9) 2 / 9
+Jaccard                        22%
 JaroWinkler                    PREFIX 42% / SUFFIX 46%
 Phonetic                       Metaphone=BL 38% / Soundex=B400 25%
 Meaning                        Angle 46% / Use 40% / ConceptNet 0% / GloVe 0% 
/ FastText 31%
@@ -1076,7 +1125,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 2): leg
 LongestCommonSubsequence       2
 Levenshtein                    Distance: 6, Insert: 5, Delete: 0, Substitute: 1
-Jaccard                        25%  (2/8) 1 / 4
+Jaccard                        25%
 JaroWinkler                    PREFIX 47% / SUFFIX 0%
 Phonetic                       Metaphone=LK 38% / Soundex=L200 0%
 Meaning                        Angle 50% / Use 18% / ConceptNet 11% / GloVe 
13% / FastText 37%
@@ -1093,7 +1142,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 3): languish
 LongestCommonSubsequence       2
 Levenshtein                    Distance: 8, Insert: 0, Delete: 0, Substitute: 8
-Jaccard                        15%  (2/13) 2 / 13
+Jaccard                        15%
 JaroWinkler                    PREFIX 50% / SUFFIX 50%
 Phonetic                       Metaphone=LNKX 34% / Soundex=L522 0%
 Meaning                        Angle 46% / Use 12% / ConceptNet -11% / GloVe 
-4% / FastText 25%
@@ -1106,7 +1155,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 4): election
 LongestCommonSubsequence       5
 Levenshtein                    Distance: 4, Insert: 0, Delete: 0, Substitute: 4
-Jaccard                        40%  (4/10) 2 / 5
+Jaccard                        40%
 JaroWinkler                    PREFIX 83% / SUFFIX 75%
 Phonetic                       Metaphone=ELKXN 50% / Soundex=E423 75%
 Meaning                        Angle 47% / Use 13% / ConceptNet -5% / GloVe 
-7% / FastText 26%
@@ -1122,7 +1171,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 5): elevator
 LongestCommonSubsequence       8
 Levenshtein                    Distance: 0, Insert: 0, Delete: 0, Substitute: 0
-Jaccard                        100%  (7/7) 1
+Jaccard                        100%
 JaroWinkler                    PREFIX 100% / SUFFIX 100%
 Phonetic                       Metaphone=ELFTR 100% / Soundex=E413 100%
 Meaning                        Angle 100% / Use 100% / ConceptNet 100% / GloVe 
100% / FastText 100%
@@ -1160,7 +1209,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 2): coarse
 LongestCommonSubsequence       3
 Levenshtein                    Distance: 4, Insert: 0, Delete: 0, Substitute: 4
-Jaccard                        57%  (4/7) 4 / 7
+Jaccard                        57%
 JaroWinkler                    PREFIX 67% / SUFFIX 67%
 Phonetic                       Metaphone=KRS 74% / Soundex=C620 75%
 Meaning                        Angle 51% / Use 12% / ConceptNet 5% / GloVe 23% 
/ FastText 26%
@@ -1181,7 +1230,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 3): roasts
 LongestCommonSubsequence       3
 Levenshtein                    Distance: 6, Insert: 0, Delete: 0, Substitute: 6
-Jaccard                        67%  (4/6) 2 / 3
+Jaccard                        67%
 JaroWinkler                    PREFIX 56% / SUFFIX 56%
 Phonetic                       Metaphone=RSTS 61% / Soundex=R232 25%
 Meaning                        Angle 54% / Use 25% / ConceptNet 18% / GloVe 
18% / FastText 31%
@@ -1201,7 +1250,7 @@ Possible letters: a b c d e f g h i j k l m n o p q r s t 
u v w x y z
 Guess the hidden word (turn 4): carrot
 LongestCommonSubsequence       6
 Levenshtein                    Distance: 0, Insert: 0, Delete: 0, Substitute: 0
-Jaccard                        100%  (5/5) 1
+Jaccard                        100%
 JaroWinkler                    PREFIX 100% / SUFFIX 100%
 Phonetic                       Metaphone=KRT 100% / Soundex=C630 100%
 Meaning                        Angle 100% / Use 100% / ConceptNet 100% / GloVe 
100% / FastText 100%
@@ -1211,6 +1260,42 @@ Congratulations, you guessed correctly!
 
 Success!
 
+=== Hints
+
+Some word guessing games allow the player to ask for hints.
+For our game, we decided to provide hints at regular intervals,
+giving stronger hints as the game progressed. We used the
+20 nearest similar words as returned by the `wordsNearest` method
+for the three word2vec models and then selected a subset.
+
+Although not needed in the games we have shown,
+here are what the hints would have been for Round 3.
+
+----
+After round 8: root_vegetable, daucus
+After round 16: diced, cauliflower, cucumber
+After round 24: celery, onion, sticks, zucchini
+----
+
+=== Further Evolution
+
+Our goal was to introduce you to a number of algorithms that you might use in 
a word game,
+rather than create a fully-polished game. If we were going to progress such a 
game, one of the
+challenges would be how to represent the large number of parameters to the 
user after each round.
+We could work on some pretty bar-charts like in 
https://semantle.com/[Semantle]:
+
+image:img/semantle.png[semantle game,width=50%]
+
+And we could add a prettier representation of available letters, e.g. greyed 
out keys on a keyboard, like in 
https://www.nytimes.com/games/wordle/index.html[Wordle]:
+
+image:img/wordle.png[world game,width=30%]
+
+But we might also just use a bubble-chart, like we showed earlier,
+and let datascience condense the results for us. We might end up with
+a chart something like this (some guesses and hints for Round 3 shown):
+
+image:img/gameBubble.png[Game BubleChart,width=70%]
+
 == Further information [[further_info]]
 
 Source code for this post:
diff --git a/site/src/site/blog/img/gameBubble.png 
b/site/src/site/blog/img/gameBubble.png
new file mode 100644
index 0000000..0889b35
Binary files /dev/null and b/site/src/site/blog/img/gameBubble.png differ
diff --git a/site/src/site/blog/img/semantle.png 
b/site/src/site/blog/img/semantle.png
new file mode 100644
index 0000000..0e20db8
Binary files /dev/null and b/site/src/site/blog/img/semantle.png differ
diff --git a/site/src/site/blog/img/wordle.png 
b/site/src/site/blog/img/wordle.png
new file mode 100644
index 0000000..29ccd04
Binary files /dev/null and b/site/src/site/blog/img/wordle.png differ

Reply via email to