This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-dev-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 0725cd7 2025/04/22 05:52:07: Generated dev website from
groovy-website@b70a77a
0725cd7 is described below
commit 0725cd749590344aa7632ce8345cab512a47b3b8
Author: jenkins <[email protected]>
AuthorDate: Tue Apr 22 05:52:07 2025 +0000
2025/04/22 05:52:07: Generated dev website from groovy-website@b70a77a
---
blog/whisky-revisited.html | 75 ++++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 72 insertions(+), 3 deletions(-)
diff --git a/blog/whisky-revisited.html b/blog/whisky-revisited.html
index 4b5c0bf..0884c5e 100644
--- a/blog/whisky-revisited.html
+++ b/blog/whisky-revisited.html
@@ -204,7 +204,9 @@ The highest correlations are between <em>Smoky</em> and
<em>Medicinal</em>, and
Some, like <em>Floral</em> and <em>Medicinal</em>, are very unrelated.</p>
</div>
<div class="paragraph">
-<p>Let’s now explore searching for whiskies of a particular flavor,
+<p>Groovy has a flexible syntax. Underdog has used this to piggyback on
Groovy’s list notation
+allowing column expressions for filtering data within a dataframe.
+Let’s use column expressions to find whiskies of a particular flavor,
in this case profiles that are somewhat <em>fruity</em> and somewhat
<em>sweet</em> in flavor.</p>
</div>
<div class="listingblock">
@@ -341,8 +343,51 @@ for (int i in clusters.toSet()) {
</div>
</div>
<div class="paragraph">
-<p>It’s very hard to visualize 12 dimensional data,
-so let’s project our data onto 2 dimensions using PCA and store those
projections back into the dataframe:</p>
+<p>We might also be interested in the cluster centroids, i.e. the average
flavor profiles
+for each cluster. Currently, Underdog uses Smile, under the covers,
+for clustering via K-Means. The Smile K-Means model already calculates the
centroids
+but currently, that information is behind Underdog’s simplified K-Means
abstraction.</p>
+</div>
+<div class="paragraph">
+<p>Nevertheless, it isn’t hard to recalculate the centroids
ourselves:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">def summary = df
+ .agg(features.collectEntries{ f -> [f, 'mean']})
+ .by('Cluster')
+ .sort_values(false, 'Cluster')
+ .rename('Flavour Centroids')</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We’ll take the results and do some minor formatting changes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">(summary.columns -
'Cluster').each { c ->
+ summary[c] = summary[c](Double, Double) {it.round(3) }
+}
+println summary</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Which has this output:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>
Mean flavor by Cluster
+ Cluster | Mean [Body] | Mean [Sweetness] | Mean [Smoky] | Mean
[Medicinal] | Mean [Tobacco] | Mean [Honey] | Mean [Spicy] | Mean
[Winey] | Mean [Nutty] | Mean [Malty] | Mean [Fruity] | Mean [Floral] |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+ 0 | 2.76 | 2.44 | 1.44 |
0.04 | 0 | 1.88 | 1.68 | 1.92 |
1.92 | 2.04 | 2.16 | 1.72 |
+ 1 | 2.529 | 1.647 | 2.765 |
2.118 | 0.294 | 0.647 | 1.647 | 0.588 |
1.353 | 1.412 | 1.353 | 0.941 |
+ 2 | 1.5 | 2.455 | 1.114 |
0.227 | 0.114 | 1.114 | 1.114 | 0.591 |
1.25 | 1.818 | 1.773 | 1.977 |</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Looking at the centroids is one way to understand how the whiskies have
been grouped.
+But, it’s very hard to visualize 12 dimensional data, so instead,
+let’s project our data onto 2 dimensions using PCA and store those
projections back into the dataframe:</p>
</div>
<div class="listingblock">
<div class="content">
@@ -588,6 +633,30 @@ use the normal Groovy extension methods:</p>
</div>
</div>
<div class="paragraph">
+<p>The cluster centroids, i.e. the average flavor profiles
+for each cluster. These are available from the Smile model (we’ll
denormalize the values
+by multiplying by 4, and then pretty print them to 3 decimal places):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">println 'Cluster '
+ features.join(' ')
+model.centers().eachWithIndex { c, i ->
+ println " $i: ${c*.multiply(4).collect('%.3f'::formatted).join(' ')}"
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Which has this output:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Cluster Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey Nutty
Malty Fruity Floral
+ 0: 1.569 2.392 1.235 0.294 0.098 1.098 1.255 0.608 1.235 1.745
1.784 1.961
+ 1: 2.783 2.435 1.478 0.043 0.000 1.913 1.652 2.000 1.957 2.087
2.174 1.696
+ 2: 2.833 1.583 2.917 2.583 0.417 0.583 1.417 0.583 1.500 1.500
1.167 0.583</pre>
+</div>
+</div>
+<div class="paragraph">
<p>We can also project onto two dimensions using Principal Component Analysis
(PCA).
We’ll again use the
<a
href="https://haifengl.github.io/feature.html#dimension-reduction">Smile</a>
functionality for this.