(groovy-website) branch asf-site updated: minor tweaks

paulk Sun, 01 Sep 2024 19:59:13 -0700

This is an automated email from the ASF dual-hosted git repository.

paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git



The following commit(s) were added to refs/heads/asf-site by this push:
     new c0fe671  minor tweaks
c0fe671 is described below

commit c0fe671f6717bcc61f223369dc1101a002976962
Author: Paul King <[email protected]>
AuthorDate: Mon Sep 2 12:58:08 2024 +1000

    minor tweaks
---
 site/src/site/blog/groovy-graph-databases.adoc | 206 ++++++++++++++++---------
 site/src/site/blog/img/BackstrokeRecord.png    | Bin 0 -> 783265 bytes
 2 files changed, 132 insertions(+), 74 deletions(-)

diff --git a/site/src/site/blog/groovy-graph-databases.adoc 
b/site/src/site/blog/groovy-graph-databases.adoc
index 186e386..73f6d40 100644
--- a/site/src/site/blog/groovy-graph-databases.adoc
+++ b/site/src/site/blog/groovy-graph-databases.adoc
@@ -5,13 +5,23 @@ Paul King
 :draft: true
 :description: This post illustrates using graph databases with Groovy.
 
+In this blog post, we look at using graph databases with Groovy.
+We'll look at:
+
+* Some advantages of graph database technologies
+* Some features of Groovy which make using such databases a little nicer
+* Code examples for a common case study across 7 interesting graph databases
+
+== Case Study
+
 The Olympics is over for another 4 years. For sports fans, there were many 
exciting moments.
 Let's look at just one event where the Olympic record was broken several times 
over the
-last three years. We'll look at the women's 100m backstroke and model the 
results as a graph database.
+last three years. We'll look at the women's 100m backstroke and model the 
results using
+graph databases.
 
 Why the women's 100m backstroke? Well, that was a particularly exciting event
 in terms of broken records. In Heat 4 of the Tokyo 2021 Olympics, Kylie Masse 
broke the record previously
-held by Emily Seebohm at the London 2012 Olympics. A few minutes later in Heat 
5, Regan Smith
+held by Emily Seebohm from the London 2012 Olympics. A few minutes later in 
Heat 5, Regan Smith
 broke the record again. Then in another few minutes in Heat 6, Kaylee McKeown 
broke the record again.
 On the following day in Semifinal 1, Regan took back the record. Then, on the 
following
 day in the final, Kaylee reclaimed the record. At the Paris 2024 Olympics,
@@ -19,14 +29,99 @@ Kaylee bettered her own record in the final. Then a few 
days later,
 Regan lead off the 4 x 100m medley relay and broke the backstroke record 
swimming the first leg.
 That makes 7 times the record was broken across the 2 games!
 
+image:img/BackstrokeRecord.png[Result of Semifinal1,70%]
+
 We'll have vertices in our graph database corresponding to the swimmers and 
the swims.
-We'll use the labels `swimmer` and `swim` for these vertices. We'll have 
relationships
-such as `swam` and `supercedes` between vertices. We'll explore modelling and 
querying the event
+We'll use the labels `Swimmer` and `Swim` for these vertices. We'll have 
relationships
+such as `swam` and `supersedes` between vertices.
+We'll explore modelling and querying the event
 information using several graph database technologies.
 
 The examples in this post can be found on
 https://github.com/paulk-asert/groovy-graphdb/[GitHub].
 
+== Why graph databases?
+
+RDBMS systems are many times more popular than graph databases.
+This blog post doesn't aim to convert everyone to use graph databases all the 
time,
+but we'll show you some examples of when it might make sense and let you make 
up your own mind.
+
+Graph databases are known for more succinct queries
+and vastly more efficient queries in some scenarios.
+Which scenarios? Usually, it boils down to relationships.
+If there are important relationships between data in your system,
+graph databases might make sense.
+
+As a first example, do you prefer this cypher query (it's from the TuGraph 
code we'll see later
+but other technologies are similar):
+
+[source,sql]
+----
+MATCH (sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'})
+RETURN DISTINCT sr.country AS country
+----
+
+Or the equivalent SQL query assuming we were storing
+the information in relational tables:
+
+[source,sql]
+----
+SELECT DISTINCT country FROM Swimmer
+LEFT JOIN Swimmer_Swim
+    ON Swimmer.swimmerId = Swimmer_Swim.fkSwimmer
+LEFT JOIN Swim
+    ON Swim.swimId = Swimmer_Swim.fkSwim
+WHERE Swim.at = 'Paris 2024'
+----
+
+This SQL query is typical of what is required when we have a many-to-many 
relationship
+between our entities, in this case _swimmers_ and _swims_. Many-to-many is 
required to
+correctly model relay swims like the last record swim (though for brevity, we 
haven't
+included the other relay swimmers in our dataset). The multiple joins in that 
query
+can also be notoriously slow for large datasets.
+
+We'll see other examples later too, one being a query involving traversal of 
relationships.
+Here is the cypher (again from TuGraph):
+
+[source,sql]
+----
+MATCH (s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'})
+RETURN s1.at as at, s1.event as event
+----
+
+And the equivalent SQL:
+
+[source,sql]
+----
+WITH RECURSIVE traversed(swimId) AS (
+    SELECT fkNew FROM Supersedes
+    WHERE fkOld IN (
+        SELECT swimId FROM Swim
+        WHERE event = 'Heat 4' AND at = 'London 2012'
+    )
+    UNION ALL
+    SELECT Supersedes.fkNew as swimId
+    FROM traversed as t
+        JOIN Supersedes
+            ON t.swimId = Supersedes.fkOld
+    WHERE t.swimId = swimId
+)
+SELECT at, event FROM Swim
+WHERE swimId IN (SELECT * FROM traversed)
+----
+
+Here we have a `Supersedes` table and a recursive SQL function, `traversed`.
+The details aren't important, but it shows the kind of complexity typically
+required for the kind of relationship traversal we are looking at.
+There are certainly far more complex SQL examples for different kinds of
+traversals like shortest path.
+
+Now, it's time to explore the case study using our different database 
technologies.
+We tried to pick technologies that seem reasonably well maintained, had 
reasonable
+JVM support, and had any features that seemed worth showing off. Several we
+selected because they have TinkerPop support. It's a Groovy-based technology
+and will be our first technology to explore.
+
 == Apache TinkerPop
 
 Our first technology to examine is https://tinkerpop.apache.org/[Apache 
TinkerPop™].
@@ -36,8 +131,9 @@ 
image:https://tinkerpop.apache.org/img/tinkerpop-splash.png[tinkerpop logo,70%]
 TinkerPop is an open source computing framework for graph databases. It 
provides
 a common abstraction layer, and a graph query language, called Gremlin.
 This allows you to work with numerous graph database implementations in a 
consistent way.
-TinkerPop also provides its own graph engine implementation, called 
TinkerGraph, which is what
-we'll use initially.
+TinkerPop also provides its own graph engine implementation, called 
TinkerGraph,
+which is what we'll use initially. TinkerPop/Gremlin will be a technology we 
revisit
+for other databases later.
 
 We'll look at the swims for the medalists and record breakers at the Tokyo 
2021 and Paris 2024 Olympics
 in the women's 100m backstroke. For reference purposes, we'll also include the 
previous swim that
@@ -308,16 +404,39 @@ Node.metaClass {
 }
 ----
 
-Now we use normal Groovy property access for setting the node properties. It 
looks much cleaner.
+What does this do? The propertyMissing lines catch attempts to use Groovy's
+normal property access and funnels then through the `getProperty` and 
`setProperty` methods.
+The methodMissing line means any attempted method calls that we don't recognize
+are intended to be relationship creation, so we funnel them through the 
appropriate
+method call.
+
+Now we can use normal Groovy property access for setting the node properties.
+It looks much cleaner.
 We define an edge relationship simply by calling a method having the 
relationship name.
 
 [source,groovy]
 ----
-km = tx.createNode(label('swimmer'))
+km = tx.createNode(label('Swimmer'))
 km.name = 'Kylie Masse'
 km.country = '🇨🇦'
+----
+
+The code is already a little cleaner, but we can tweak the metaprogramming a 
little
+more to get rid of the noise associated with the `label` method:
 
-swim2 = tx.createNode(label('swim'))
+[source,groovy]
+----
+Transaction.metaClass {
+    createNode { String labelName -> delegate.createNode(label(labelName)) }
+}
+----
+
+This adds an overload for `createNode` that takes a `String`, and
+node creation is improved again, as we can see here:
+
+[source,groovy]
+----
+swim2 = tx.createNode('Swim')
 swim2.time = 58.17d
 swim2.result = 'First'
 swim2.event = 'Heat 4'
@@ -325,7 +444,7 @@ swim2.at = 'Tokyo 2021'
 km.swam(swim2)
 swim2.supercedes(swim1)
 
-swim3 = tx.createNode(label('swim'))
+swim3 = tx.createNode('Swim')
 swim3.time = 57.72d
 swim3.result = '🥈'
 swim3.event = 'Final'
@@ -333,8 +452,9 @@ swim3.at = 'Tokyo 2021'
 km.swam(swim3)
 ----
 
-The code is certainly a lot cleaner, and it was quite a minimal amount of work 
to define the necessary
-metaprogramming. With a little bit more work, we could use static 
metaprogramming techniques.
+The code for relationships is certainly a lot cleaner too,
+and it was quite a minimal amount of work to define the necessary 
metaprogramming.
+With a little bit more work, we could use static metaprogramming techniques.
 This would give us better IDE completion.
 
 Another interesting topic which we won't elaborate here is stronger type 
checking for graphs.
@@ -956,68 +1076,6 @@ run('''
 ''')*.asMap().each{ println "$it.at $it.event" }
 ----
 
-.An Aside on Graph Databases
-****
-
-Graph databases are known for more succinct queries
-and vastly more efficient queries in some scenarios.
-Do you prefer this cypher query:
-
-[source,sql]
-----
-MATCH (sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'})
-RETURN DISTINCT sr.country AS country
-----
-
-Or the equivalent SQL query assuming we were storing all the information in 
tables:
-
-[source,sql]
-----
-SELECT DISTINCT country FROM Swimmer
-LEFT JOIN Swimmer_Swim
-    ON Swimmer.swimmerId = Swimmer_Swim.fkSwimmer
-LEFT JOIN Swim
-    ON Swim.swimId = Swimmer_Swim.fkSwim
-WHERE Swim.at = 'Paris 2024'
-----
-
-Here we are assuming a many-to-many relationship between _swimmers_ and _swims_
-which is what is required to correctly model relay swims.
-
-For the traversal case, the difference is even more obvious.
-Here is the cypher:
-
-[source,sql]
-----
-MATCH (s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'})
-RETURN s1.at as at, s1.event as event
-----
-
-And the equivalent cypher:
-
-[source,sql]
-----
-WITH RECURSIVE traversed(swimId) AS (
-    SELECT fkNew FROM Supersedes
-    WHERE fkOld IN (
-        SELECT swimId FROM Swim
-        WHERE event = 'Heat 4' AND at = 'London 2012'
-    )
-    UNION ALL
-    SELECT Supersedes.fkNew as swimId
-    FROM traversed as t
-        JOIN Supersedes
-            ON t.swimId = Supersedes.fkOld
-    WHERE t.swimId = swimId
-)
-SELECT at, event FROM Swim
-WHERE swimId IN (SELECT * FROM traversed)
-----
-
-Here we have a `Supersedes` table and a recursive SQL function, `traversed`.
-
-****
-
 == Apache HugeGraph
 
 Our final technology is Apache
diff --git a/site/src/site/blog/img/BackstrokeRecord.png 
b/site/src/site/blog/img/BackstrokeRecord.png
new file mode 100644
index 0000000..c55e62f
Binary files /dev/null and b/site/src/site/blog/img/BackstrokeRecord.png differ

(groovy-website) branch asf-site updated: minor tweaks

Reply via email to