This is an automated email from the ASF dual-hosted git repository.

git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-dev-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 0725cd7  2025/04/22 05:52:07: Generated dev website from 
groovy-website@b70a77a
0725cd7 is described below

commit 0725cd749590344aa7632ce8345cab512a47b3b8
Author: jenkins <[email protected]>
AuthorDate: Tue Apr 22 05:52:07 2025 +0000

    2025/04/22 05:52:07: Generated dev website from groovy-website@b70a77a
---
 blog/whisky-revisited.html | 75 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 72 insertions(+), 3 deletions(-)

diff --git a/blog/whisky-revisited.html b/blog/whisky-revisited.html
index 4b5c0bf..0884c5e 100644
--- a/blog/whisky-revisited.html
+++ b/blog/whisky-revisited.html
@@ -204,7 +204,9 @@ The highest correlations are between <em>Smoky</em> and 
<em>Medicinal</em>, and
 Some, like <em>Floral</em> and <em>Medicinal</em>, are very unrelated.</p>
 </div>
 <div class="paragraph">
-<p>Let&#8217;s now explore searching for whiskies of a particular flavor,
+<p>Groovy has a flexible syntax. Underdog has used this to piggyback on 
Groovy&#8217;s list notation
+allowing column expressions for filtering data within a dataframe.
+Let&#8217;s use column expressions to find whiskies of a particular flavor,
 in this case profiles that are somewhat <em>fruity</em> and somewhat 
<em>sweet</em> in flavor.</p>
 </div>
 <div class="listingblock">
@@ -341,8 +343,51 @@ for (int i in clusters.toSet()) {
 </div>
 </div>
 <div class="paragraph">
-<p>It&#8217;s very hard to visualize 12 dimensional data,
-so let&#8217;s project our data onto 2 dimensions using PCA and store those 
projections back into the dataframe:</p>
+<p>We might also be interested in the cluster centroids, i.e. the average 
flavor profiles
+for each cluster. Currently, Underdog uses Smile, under the covers,
+for clustering via K-Means. The Smile K-Means model already calculates the 
centroids
+but currently, that information is behind Underdog&#8217;s simplified K-Means 
abstraction.</p>
+</div>
+<div class="paragraph">
+<p>Nevertheless, it isn&#8217;t hard to recalculate the centroids 
ourselves:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">def summary = df
+    .agg(features.collectEntries{ f -&gt; [f, 'mean']})
+    .by('Cluster')
+    .sort_values(false, 'Cluster')
+    .rename('Flavour Centroids')</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>We&#8217;ll take the results and do some minor formatting changes:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">(summary.columns - 
'Cluster').each { c -&gt;
+    summary[c] = summary[c](Double, Double) {it.round(3) }
+}
+println summary</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Which has this output:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>                                                                          
                            Mean flavor by Cluster
+ Cluster  |  Mean [Body]  |  Mean [Sweetness]  |  Mean [Smoky]  |  Mean 
[Medicinal]  |  Mean [Tobacco]  |  Mean [Honey]  |  Mean [Spicy]  |  Mean 
[Winey]  |  Mean [Nutty]  |  Mean [Malty]  |  Mean [Fruity]  |  Mean [Floral]  |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+       0  |         2.76  |              2.44  |          1.44  |              
0.04  |               0  |          1.88  |          1.68  |          1.92  |   
       1.92  |          2.04  |           2.16  |           1.72  |
+       1  |        2.529  |             1.647  |         2.765  |             
2.118  |           0.294  |         0.647  |         1.647  |         0.588  |  
       1.353  |         1.412  |          1.353  |          0.941  |
+       2  |          1.5  |             2.455  |         1.114  |             
0.227  |           0.114  |         1.114  |         1.114  |         0.591  |  
        1.25  |         1.818  |          1.773  |          1.977  |</pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Looking at the centroids is one way to understand how the whiskies have 
been grouped.
+But, it&#8217;s very hard to visualize 12 dimensional data, so instead,
+let&#8217;s project our data onto 2 dimensions using PCA and store those 
projections back into the dataframe:</p>
 </div>
 <div class="listingblock">
 <div class="content">
@@ -588,6 +633,30 @@ use the normal Groovy extension methods:</p>
 </div>
 </div>
 <div class="paragraph">
+<p>The cluster centroids, i.e. the average flavor profiles
+for each cluster. These are available from the Smile model (we&#8217;ll 
denormalize the values
+by multiplying by 4, and then pretty print them to 3 decimal places):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">println 'Cluster ' 
+ features.join(' ')
+model.centers().eachWithIndex { c, i -&gt;
+    println "   $i:   ${c*.multiply(4).collect('%.3f'::formatted).join('  ')}"
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Which has this output:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre>Cluster Body Sweetness Smoky Medicinal Tobacco Honey Spicy Winey Nutty 
Malty Fruity Floral
+   0:   1.569  2.392  1.235  0.294  0.098  1.098  1.255  0.608  1.235  1.745  
1.784  1.961
+   1:   2.783  2.435  1.478  0.043  0.000  1.913  1.652  2.000  1.957  2.087  
2.174  1.696
+   2:   2.833  1.583  2.917  2.583  0.417  0.583  1.417  0.583  1.500  1.500  
1.167  0.583</pre>
+</div>
+</div>
+<div class="paragraph">
 <p>We can also project onto two dimensions using Principal Component Analysis 
(PCA).
 We&#8217;ll again use the
 <a 
href="https://haifengl.github.io/feature.html#dimension-reduction";>Smile</a> 
functionality for this.

Reply via email to