This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 3572c81  initial draft
3572c81 is described below

commit 3572c81466d500574c5f207d22c8e60bda5bc905
Author: Paul King <[email protected]>
AuthorDate: Thu Apr 17 23:19:46 2025 +1000

    initial draft
---
 site/src/site/blog/img/underdogCorrelationPlot.png | Bin 0 -> 97639 bytes
 site/src/site/blog/img/underdogRadarPlot.png       | Bin 0 -> 224624 bytes
 site/src/site/blog/img/underdogScatterPlot.png     | Bin 0 -> 111853 bytes
 site/src/site/blog/whisky-underdog.adoc            | 148 +++++++++++++++++++++
 4 files changed, 148 insertions(+)

diff --git a/site/src/site/blog/img/underdogCorrelationPlot.png 
b/site/src/site/blog/img/underdogCorrelationPlot.png
new file mode 100644
index 0000000..2761550
Binary files /dev/null and b/site/src/site/blog/img/underdogCorrelationPlot.png 
differ
diff --git a/site/src/site/blog/img/underdogRadarPlot.png 
b/site/src/site/blog/img/underdogRadarPlot.png
new file mode 100644
index 0000000..0b085dd
Binary files /dev/null and b/site/src/site/blog/img/underdogRadarPlot.png differ
diff --git a/site/src/site/blog/img/underdogScatterPlot.png 
b/site/src/site/blog/img/underdogScatterPlot.png
new file mode 100644
index 0000000..19cac0f
Binary files /dev/null and b/site/src/site/blog/img/underdogScatterPlot.png 
differ
diff --git a/site/src/site/blog/whisky-underdog.adoc 
b/site/src/site/blog/whisky-underdog.adoc
new file mode 100644
index 0000000..02ac81d
--- /dev/null
+++ b/site/src/site/blog/whisky-underdog.adoc
@@ -0,0 +1,148 @@
+= A first look at Underdog
+Paul King
+:revdate: 2025-04-17T22:30:00+00:00
+:draft: true
+:keywords: whisky, groovy, kmeans, clustering
+:description: This post looks at using the Underdog data science library.
+
+++++
+<table><tr><td style="padding: 0px; padding-left: 20px; padding-right: 20px; 
font-size: 18pt; line-height: 1.5; margin: 0px">
+++++
+[blue]#_Let's explore Whisky profiles using Underdog!_#
+++++
+</td></tr></table>
+++++
+
+A relatively new data science library is
+https://grooviter.github.io/underdog/[Underdog].
+Let's use it to explore Whiskey profiles.
+It has many Groovy-powered features delivering a very expressive developer 
experience.
+
+Underdog sits on top of some well-known data-science libraries like Smile and 
Tablesaw, so
+if you have used either of those libraries, you'll recognise parts of the 
functionality.
+
+First, we'll load our CSV file:
+
+[source,groovy]
+----
+def file = new File(getClass().classLoader.getResource('whiskey.csv').file)
+def df = Underdog.df().read_csv(file.path).drop('RowID')
+----
+
+Let's look at the shape of and schema for the data:
+
+[source,groovy]
+----
+println df.shape()
+println df.schema()
+----
+
+It gives this output:
+
+----
+86 rows X 13 cols
+        Structure of whiskey.csv
+ Index  |  Column Name  |  Column Type  |
+-----------------------------------------
+     0  |   Distillery  |       STRING  |
+     1  |         Body  |      INTEGER  |
+     2  |    Sweetness  |      INTEGER  |
+     3  |        Smoky  |      INTEGER  |
+     4  |    Medicinal  |      INTEGER  |
+     5  |      Tobacco  |      INTEGER  |
+     6  |        Honey  |      INTEGER  |
+     7  |        Spicy  |      INTEGER  |
+     8  |        Winey  |      INTEGER  |
+     9  |        Nutty  |      INTEGER  |
+    10  |        Malty  |      INTEGER  |
+    11  |       Fruity  |      INTEGER  |
+    12  |       Floral  |      INTEGER  |
+----
+
+Let's look at a correlation matrix plot of the data:
+
+[source,groovy]
+----
+def plot = Underdog.plots()
+def features = df.columns - 'Distillery'
+plot.correlationMatrix(df[features]).show()
+----
+
+Which has this output:
+
+image:img/underdogCorrelationPlot.png[correlation plot,50%]
+
+We can also look at the data for any individual distillery using a radar plot. 
Let's look at it for row 0:
+
+[source,groovy]
+----
+def data = df[features] as double[][]
+plot.radar(
+    features,
+    [4] * features.size(),
+    data[0].toList(),
+    df['Distillery'][0]
+).show()
+----
+
+Which has this output:
+
+image:img/underdogCorrelationPlot.png[radar plot,50%]
+
+Let's now cluster the distilleries using k-means:
+
+[source,groovy]
+----
+def ml = Underdog.ml()
+def clusters = ml.clustering.kMeans(data, nClusters: 3)
+df['Cluster'] = clusters.toList()
+
+println 'Clusters'
+for (int i in clusters.toSet()) {
+    println "$i:${df[df['Cluster'] == i]['Distillery'].join(', ')}"
+}
+----
+
+It gives the following output:
+
+----
+Clusters
+0:Aberfeldy, Aberlour, Auchroisk, Balmenach, Belvenie, BenNevis, Benrinnes, 
Benromach, BlairAthol, Dailuaine, Dalmore, Edradour, GlenOrd, Glendronach, 
Glendullan, Glenfarclas, Glenlivet, Glenrothes, Glenturret, Knochando, 
Longmorn, Macallan, Mortlach, RoyalLochnagar, Strathisla
+1:Ardbeg, Balblair, Bowmore, Bruichladdich, Caol Ila, Clynelish, GlenGarioch, 
GlenScotia, Highland Park, Isle of Jura, Lagavulin, Laphroig, Oban, 
OldPulteney, Springbank, Talisker, Teaninich
+2:AnCnoc, Ardmore, ArranIsleOf, Auchentoshan, Aultmore, Benriach, Bladnoch, 
Bunnahabhain, Cardhu, Craigallechie, Craigganmore, Dalwhinnie, Deanston, 
Dufftown, GlenDeveronMacduff, GlenElgin, GlenGrant, GlenKeith, GlenMoray, 
GlenSpey, Glenallachie, Glenfiddich, Glengoyne, Glenkinchie, Glenlossie, 
Glenmorangie, Inchgower, Linkwood, Loch Lomond, Mannochmore, Miltonduff, 
OldFettercairn, RoyalBrackla, Scapa, Speyburn, Speyside, Strathmill, Tamdhu, 
Tamnavulin, Tobermory, Tomatin, Tomintoul, Tom [...]
+----
+
+Finally, let's project our data onto 2 dimensions using PCA and plot that as a 
scatter plot:
+
+[source,groovy]
+----
+def pca = ml.features.pca(data, 2)
+def projected = pca.apply(data)
+
+df['X'] = projected*.getAt(0)
+df['Y'] = projected*.getAt(1)
+
+plot.scatter(
+    df['X'],
+    df['Y'],
+    df['Cluster'],
+    'Whiskey Clusters'
+).show()
+----
+
+The output looks like this:
+
+image:img/underdogScatterPlot.png[scatter plot,50%]
+
+== Further information
+
+* https://grooviter.github.io/underdog/[Underdog]
+
+== Conclusion
+
+We have looked at how to use Underdog.
+
+.Update history
+****
+*17/Apr/2025*: Initial version +
+****

Reply via email to