This is an automated email from the ASF dual-hosted git repository.
git-site-role pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-dev-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2e98b0c 2024/09/02 03:22:46: Generated dev website from
groovy-website@c0fe671
2e98b0c is described below
commit 2e98b0c34516ab7f1594e4c78d2660f2ec7ae02a
Author: jenkins <[email protected]>
AuthorDate: Mon Sep 2 03:22:46 2024 +0000
2024/09/02 03:22:46: Generated dev website from groovy-website@c0fe671
---
blog/groovy-graph-databases.html | 247 ++++++++++++++++++++++++++-------------
blog/img/BackstrokeRecord.png | Bin 0 -> 783265 bytes
2 files changed, 164 insertions(+), 83 deletions(-)
diff --git a/blog/groovy-graph-databases.html b/blog/groovy-graph-databases.html
index e3da5e1..fad38ba 100644
--- a/blog/groovy-graph-databases.html
+++ b/blog/groovy-graph-databases.html
@@ -53,17 +53,40 @@
</ul>
</div>
</div>
- </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3'><ul
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a
href='#doc'>Using Graph Databases with Groovy</a></li><li><a
href='#_apache_tinkerpop' class='anchor-link'>Apache TinkerPop</a></li><li><a
href='#_neo4j' class='anchor-link'>Neo4j</a></li><li><a href='#_apache_age'
class='anchor-link'>Apache AGE</a></li><li><a href='#_orientdb' class= [...]
+ </div><div id='content' class='page-1'><div
class='row'><div class='row-fluid'><div class='col-lg-3'><ul
class='nav-sidebar'><li><a href='./'>Blog index</a></li><li class='active'><a
href='#doc'>Using Graph Databases with Groovy</a></li><li><a
href='#_case_study' class='anchor-link'>Case Study</a></li><li><a
href='#_why_graph_databases' class='anchor-link'>Why graph
databases?</a></li><li><a href='#_apache_tinkerpop' class='anchor-link'>Apache
TinkerPop</a></li><l [...]
+<div class="sectionbody">
+<div class="paragraph">
+<p>In this blog post, we look at using graph databases with Groovy.
+We’ll look at:</p>
+</div>
+<div class="ulist">
+<ul>
+<li>
+<p>Some advantages of graph database technologies</p>
+</li>
+<li>
+<p>Some features of Groovy which make using such databases a little nicer</p>
+</li>
+<li>
+<p>Code examples for a common case study across 7 interesting graph
databases</p>
+</li>
+</ul>
+</div>
+</div>
+</div>
+<div class="sect1">
+<h2 id="_case_study">Case Study</h2>
<div class="sectionbody">
<div class="paragraph">
<p>The Olympics is over for another 4 years. For sports fans, there were many
exciting moments.
Let’s look at just one event where the Olympic record was broken several
times over the
-last three years. We’ll look at the women’s 100m backstroke and
model the results as a graph database.</p>
+last three years. We’ll look at the women’s 100m backstroke and
model the results using
+graph databases.</p>
</div>
<div class="paragraph">
<p>Why the women’s 100m backstroke? Well, that was a particularly
exciting event
in terms of broken records. In Heat 4 of the Tokyo 2021 Olympics, Kylie Masse
broke the record previously
-held by Emily Seebohm at the London 2012 Olympics. A few minutes later in Heat
5, Regan Smith
+held by Emily Seebohm from the London 2012 Olympics. A few minutes later in
Heat 5, Regan Smith
broke the record again. Then in another few minutes in Heat 6, Kaylee McKeown
broke the record again.
On the following day in Semifinal 1, Regan took back the record. Then, on the
following
day in the final, Kaylee reclaimed the record. At the Paris 2024 Olympics,
@@ -72,9 +95,13 @@ Regan lead off the 4 x 100m medley relay and broke the
backstroke record swimmin
That makes 7 times the record was broken across the 2 games!</p>
</div>
<div class="paragraph">
+<p><span class="image"><img src="img/BackstrokeRecord.png" alt="Result of
Semifinal1" width="70%"></span></p>
+</div>
+<div class="paragraph">
<p>We’ll have vertices in our graph database corresponding to the
swimmers and the swims.
-We’ll use the labels <code>swimmer</code> and <code>swim</code> for
these vertices. We’ll have relationships
-such as <code>swam</code> and <code>supercedes</code> between vertices.
We’ll explore modelling and querying the event
+We’ll use the labels <code>Swimmer</code> and <code>Swim</code> for
these vertices. We’ll have relationships
+such as <code>swam</code> and <code>supersedes</code> between vertices.
+We’ll explore modelling and querying the event
information using several graph database technologies.</p>
</div>
<div class="paragraph">
@@ -84,6 +111,100 @@ information using several graph database technologies.</p>
</div>
</div>
<div class="sect1">
+<h2 id="_why_graph_databases">Why graph databases?</h2>
+<div class="sectionbody">
+<div class="paragraph">
+<p>RDBMS systems are many times more popular than graph databases.
+This blog post doesn’t aim to convert everyone to use graph databases
all the time,
+but we’ll show you some examples of when it might make sense and let you
make up your own mind.</p>
+</div>
+<div class="paragraph">
+<p>Graph databases are known for more succinct queries
+and vastly more efficient queries in some scenarios.
+Which scenarios? Usually, it boils down to relationships.
+If there are important relationships between data in your system,
+graph databases might make sense.</p>
+</div>
+<div class="paragraph">
+<p>As a first example, do you prefer this cypher query (it’s from the
TuGraph code we’ll see later
+but other technologies are similar):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="sql">MATCH
(sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'})
+RETURN DISTINCT sr.country AS country</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Or the equivalent SQL query assuming we were storing
+the information in relational tables:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="sql">SELECT DISTINCT
country FROM Swimmer
+LEFT JOIN Swimmer_Swim
+ ON Swimmer.swimmerId = Swimmer_Swim.fkSwimmer
+LEFT JOIN Swim
+ ON Swim.swimId = Swimmer_Swim.fkSwim
+WHERE Swim.at = 'Paris 2024'</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This SQL query is typical of what is required when we have a many-to-many
relationship
+between our entities, in this case <em>swimmers</em> and <em>swims</em>.
Many-to-many is required to
+correctly model relay swims like the last record swim (though for brevity, we
haven’t
+included the other relay swimmers in our dataset). The multiple joins in that
query
+can also be notoriously slow for large datasets.</p>
+</div>
+<div class="paragraph">
+<p>We’ll see other examples later too, one being a query involving
traversal of relationships.
+Here is the cypher (again from TuGraph):</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="sql">MATCH
(s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'})
+RETURN s1.at as at, s1.event as event</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>And the equivalent SQL:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="sql">WITH RECURSIVE
traversed(swimId) AS (
+ SELECT fkNew FROM Supersedes
+ WHERE fkOld IN (
+ SELECT swimId FROM Swim
+ WHERE event = 'Heat 4' AND at = 'London 2012'
+ )
+ UNION ALL
+ SELECT Supersedes.fkNew as swimId
+ FROM traversed as t
+ JOIN Supersedes
+ ON t.swimId = Supersedes.fkOld
+ WHERE t.swimId = swimId
+)
+SELECT at, event FROM Swim
+WHERE swimId IN (SELECT * FROM traversed)</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>Here we have a <code>Supersedes</code> table and a recursive SQL function,
<code>traversed</code>.
+The details aren’t important, but it shows the kind of complexity
typically
+required for the kind of relationship traversal we are looking at.
+There are certainly far more complex SQL examples for different kinds of
+traversals like shortest path.</p>
+</div>
+<div class="paragraph">
+<p>Now, it’s time to explore the case study using our different database
technologies.
+We tried to pick technologies that seem reasonably well maintained, had
reasonable
+JVM support, and had any features that seemed worth showing off. Several we
+selected because they have TinkerPop support. It’s a Groovy-based
technology
+and will be our first technology to explore.</p>
+</div>
+</div>
+</div>
+<div class="sect1">
<h2 id="_apache_tinkerpop">Apache TinkerPop</h2>
<div class="sectionbody">
<div class="paragraph">
@@ -96,8 +217,9 @@ information using several graph database technologies.</p>
<p>TinkerPop is an open source computing framework for graph databases. It
provides
a common abstraction layer, and a graph query language, called Gremlin.
This allows you to work with numerous graph database implementations in a
consistent way.
-TinkerPop also provides its own graph engine implementation, called
TinkerGraph, which is what
-we’ll use initially.</p>
+TinkerPop also provides its own graph engine implementation, called
TinkerGraph,
+which is what we’ll use initially. TinkerPop/Gremlin will be a
technology we revisit
+for other databases later.</p>
</div>
<div class="paragraph">
<p>We’ll look at the swims for the medalists and record breakers at the
Tokyo 2021 and Paris 2024 Olympics
@@ -402,16 +524,42 @@ Let’s use some dynamic metaprogramming to achieve
just that.</p>
</div>
</div>
<div class="paragraph">
-<p>Now we use normal Groovy property access for setting the node properties.
It looks much cleaner.
+<p>What does this do? The propertyMissing lines catch attempts to use
Groovy’s
+normal property access and funnels then through the <code>getProperty</code>
and <code>setProperty</code> methods.
+The methodMissing line means any attempted method calls that we don’t
recognize
+are intended to be relationship creation, so we funnel them through the
appropriate
+method call.</p>
+</div>
+<div class="paragraph">
+<p>Now we can use normal Groovy property access for setting the node
properties.
+It looks much cleaner.
We define an edge relationship simply by calling a method having the
relationship name.</p>
</div>
<div class="listingblock">
<div class="content">
-<pre class="prettyprint highlight"><code data-lang="groovy">km =
tx.createNode(label('swimmer'))
+<pre class="prettyprint highlight"><code data-lang="groovy">km =
tx.createNode(label('Swimmer'))
km.name = 'Kylie Masse'
-km.country = '🇨🇦'
-
-swim2 = tx.createNode(label('swim'))
+km.country = '🇨🇦'</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>The code is already a little cleaner, but we can tweak the metaprogramming
a little
+more to get rid of the noise associated with the <code>label</code> method:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code
data-lang="groovy">Transaction.metaClass {
+ createNode { String labelName -> delegate.createNode(label(labelName)) }
+}</code></pre>
+</div>
+</div>
+<div class="paragraph">
+<p>This adds an overload for <code>createNode</code> that takes a
<code>String</code>, and
+node creation is improved again, as we can see here:</p>
+</div>
+<div class="listingblock">
+<div class="content">
+<pre class="prettyprint highlight"><code data-lang="groovy">swim2 =
tx.createNode('Swim')
swim2.time = 58.17d
swim2.result = 'First'
swim2.event = 'Heat 4'
@@ -419,7 +567,7 @@ swim2.at = 'Tokyo 2021'
km.swam(swim2)
swim2.supercedes(swim1)
-swim3 = tx.createNode(label('swim'))
+swim3 = tx.createNode('Swim')
swim3.time = 57.72d
swim3.result = '🥈'
swim3.event = 'Final'
@@ -428,8 +576,9 @@ km.swam(swim3)</code></pre>
</div>
</div>
<div class="paragraph">
-<p>The code is certainly a lot cleaner, and it was quite a minimal amount of
work to define the necessary
-metaprogramming. With a little bit more work, we could use static
metaprogramming techniques.
+<p>The code for relationships is certainly a lot cleaner too,
+and it was quite a minimal amount of work to define the necessary
metaprogramming.
+With a little bit more work, we could use static metaprogramming techniques.
This would give us better IDE completion.</p>
</div>
<div class="paragraph">
@@ -1135,74 +1284,6 @@ assert run('''
''')*.asMap().each{ println "$it.at $it.event" }</code></pre>
</div>
</div>
-<div class="sidebarblock">
-<div class="content">
-<div class="title">An Aside on Graph Databases</div>
-<div class="paragraph">
-<p>Graph databases are known for more succinct queries
-and vastly more efficient queries in some scenarios.
-Do you prefer this cypher query:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="prettyprint highlight"><code data-lang="sql">MATCH
(sr:Swimmer)-[:swam]->(sm:Swim {at: 'Paris 2024'})
-RETURN DISTINCT sr.country AS country</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Or the equivalent SQL query assuming we were storing all the information in
tables:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="prettyprint highlight"><code data-lang="sql">SELECT DISTINCT
country FROM Swimmer
-LEFT JOIN Swimmer_Swim
- ON Swimmer.swimmerId = Swimmer_Swim.fkSwimmer
-LEFT JOIN Swim
- ON Swim.swimId = Swimmer_Swim.fkSwim
-WHERE Swim.at = 'Paris 2024'</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Here we are assuming a many-to-many relationship between <em>swimmers</em>
and <em>swims</em>
-which is what is required to correctly model relay swims.</p>
-</div>
-<div class="paragraph">
-<p>For the traversal case, the difference is even more obvious.
-Here is the cypher:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="prettyprint highlight"><code data-lang="sql">MATCH
(s1:Swim)-[:supersedes*1..10]->(s2:Swim {at: 'London 2012'})
-RETURN s1.at as at, s1.event as event</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>And the equivalent cypher:</p>
-</div>
-<div class="listingblock">
-<div class="content">
-<pre class="prettyprint highlight"><code data-lang="sql">WITH RECURSIVE
traversed(swimId) AS (
- SELECT fkNew FROM Supersedes
- WHERE fkOld IN (
- SELECT swimId FROM Swim
- WHERE event = 'Heat 4' AND at = 'London 2012'
- )
- UNION ALL
- SELECT Supersedes.fkNew as swimId
- FROM traversed as t
- JOIN Supersedes
- ON t.swimId = Supersedes.fkOld
- WHERE t.swimId = swimId
-)
-SELECT at, event FROM Swim
-WHERE swimId IN (SELECT * FROM traversed)</code></pre>
-</div>
-</div>
-<div class="paragraph">
-<p>Here we have a <code>Supersedes</code> table and a recursive SQL function,
<code>traversed</code>.</p>
-</div>
-</div>
-</div>
</div>
</div>
<div class="sect1">
diff --git a/blog/img/BackstrokeRecord.png b/blog/img/BackstrokeRecord.png
new file mode 100644
index 0000000..c55e62f
Binary files /dev/null and b/blog/img/BackstrokeRecord.png differ