This is an automated email from the ASF dual-hosted git repository.
paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 6d16358 tweaks for wayang 1.0.0 release
6d16358 is described below
commit 6d16358eb2d84628881c133e9059ecd1d8a6e270
Author: Paul King <[email protected]>
AuthorDate: Fri Feb 21 10:33:03 2025 +1000
tweaks for wayang 1.0.0 release
---
.../site/blog/using-groovy-with-apache-wayang.adoc | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/site/src/site/blog/using-groovy-with-apache-wayang.adoc
b/site/src/site/blog/using-groovy-with-apache-wayang.adoc
index 6a65f5f..82ee5ac 100644
--- a/site/src/site/blog/using-groovy-with-apache-wayang.adoc
+++ b/site/src/site/blog/using-groovy-with-apache-wayang.adoc
@@ -8,6 +8,11 @@ Paul King <paulk-asert|PMC_Member>
:keywords: centroids, data science, groovy, kmeans, records, apache spark,
apache wayang
:description: This post looks at using Apache Wayang and Apache Spark with
Apache Groovy to cluster various Whiskies.
+> In the quest to find the perfect single-malt Scotch whisky,
+> let's use Apache Wayang's cross-platform data processing and
+> cross-platform machine learning capabilities to cluster
+> related whiskies by their flavour profile.
+
image:https://www.apache.org/logos/res/wayang/default.png[wayang
logo,100,float="right"]
https://wayang.apache.org/[Apache Wayang] (incubating) is an API
for big data cross-platform processing. It provides an abstraction
@@ -30,7 +35,7 @@ The whiskies produced from
https://www.niss.org/sites/default/files/ScotchWhisky01.txt[86 distilleries]
have been ranked by expert tasters according to 12 criteria
(Body, Sweetness, Malty, Smoky, Fruity, etc.).
-We'll use a KMeans algorithm to calculate the centroids.
+We'll use a https://en.wikipedia.org/wiki/K-means_clustering[KMeans] algorithm
to calculate the centroids.
This is similar to the
https://github.com/apache/incubator-wayang/blob/main/README.md#k-means[KMeans
example in the Wayang documentation]
but instead of 2 dimensions (x and y coordinates), we have 12
@@ -51,7 +56,18 @@ is the notional "point" in the middle of the cluster. For us,
it reflects the typical measure of each criteria for a whiskey
in that cluster.
-== Implementation Details
+== Implementing a distributed KMeans
+
+We'll start by using Wayang's data processing capabilities
+to write our own distributed KMeans algorithm.
+We'll circle back to look at the new built-in KMeans
+that is part of Wayang's ML4all module.
+
+To build a distributed KMeans algorithm, we'll need to
+pass around some information between the processing nodes
+on whatever data processing platform (e.g. Apache Spark)
+that we'll eventually use to run our application.
+So, we first define those data structures.
We'll start with defining a Point record: