This is an automated email from the ASF dual-hosted git repository.
paulk pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/groovy-website.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 5122ee7 add section on static typing
5122ee7 is described below
commit 5122ee751356d529f3e434d08a4ed9950b36d8b2
Author: Paul King <[email protected]>
AuthorDate: Mon Sep 2 21:27:45 2024 +1000
add section on static typing
---
site/src/site/blog/groovy-graph-databases.adoc | 176 ++++++++++++++++++-------
1 file changed, 127 insertions(+), 49 deletions(-)
diff --git a/site/src/site/blog/groovy-graph-databases.adoc
b/site/src/site/blog/groovy-graph-databases.adoc
index 73f6d40..63520b4 100644
--- a/site/src/site/blog/groovy-graph-databases.adoc
+++ b/site/src/site/blog/groovy-graph-databases.adoc
@@ -1,14 +1,14 @@
= Using Graph Databases with Groovy
Paul King
:revdate: 2024-08-20T10:18:00+00:00
-:keywords: tugraph, tinkerpop, gremlin, neo4j, apache age, graph databases,
apache hugegraph, orientdb, arcadedb, orientdb, groovy
+:keywords: tugraph, tinkerpop, gremlin, neo4j, apache age, graph databases,
apache hugegraph, arcadedb, orientdb, groovy
:draft: true
:description: This post illustrates using graph databases with Groovy.
-In this blog post, we look at using graph databases with Groovy.
+In this blog post, we look at using property graph databases with Groovy.
We'll look at:
-* Some advantages of graph database technologies
+* Some advantages of property graph database technologies
* Some features of Groovy which make using such databases a little nicer
* Code examples for a common case study across 7 interesting graph databases
@@ -27,7 +27,7 @@ On the following day in Semifinal 1, Regan took back the
record. Then, on the fo
day in the final, Kaylee reclaimed the record. At the Paris 2024 Olympics,
Kaylee bettered her own record in the final. Then a few days later,
Regan lead off the 4 x 100m medley relay and broke the backstroke record
swimming the first leg.
-That makes 7 times the record was broken across the 2 games!
+That makes 7 times the record was broken across the last 2 games!
image:img/BackstrokeRecord.png[Result of Semifinal1,70%]
@@ -42,16 +42,20 @@ https://github.com/paulk-asert/groovy-graphdb/[GitHub].
== Why graph databases?
-RDBMS systems are many times more popular than graph databases.
+RDBMS systems are many times more popular than graph databases, but there are a
+range of scenarios where graph databases are often used.
+Which scenarios? Usually, it boils down to relationships.
+If there are important relationships between data in your system,
+graph databases might make sense.
+Typical usage scenarios include fraud detection, knowledge graphs,
recommendations engines,
+social networks, and supply chain management.
+
This blog post doesn't aim to convert everyone to use graph databases all the
time,
but we'll show you some examples of when it might make sense and let you make
up your own mind.
+Graph databases certainly represent a very useful tool to have in your toolbox
should the need arise.
Graph databases are known for more succinct queries
and vastly more efficient queries in some scenarios.
-Which scenarios? Usually, it boils down to relationships.
-If there are important relationships between data in your system,
-graph databases might make sense.
-
As a first example, do you prefer this cypher query (it's from the TuGraph
code we'll see later
but other technologies are similar):
@@ -153,8 +157,8 @@ at the London 2012 Olympics. Emily Seebohm set that record
in Heat 4:
[source,groovy]
----
-var es = g.addV('swimmer').property(name: 'Emily Seebohm', country:
'π¦πΊ').next()
-swim1 = g.addV('swim').property(at: 'London 2012', event: 'Heat 4', time:
58.23, result: 'First').next()
+var es = g.addV('Swimmer').property(name: 'Emily Seebohm', country:
'π¦πΊ').next()
+swim1 = g.addV('Swim').property(at: 'London 2012', event: 'Heat 4', time:
58.23, result: 'First').next()
es.addEdge('swam', swim1)
----
@@ -197,11 +201,11 @@ Let's create some helper methods to simplify creation of
the remaining informati
[source,groovy]
----
def insertSwimmer(TraversalSource g, name, country) {
- g.addV('swimmer').property(name: name, country: country).next()
+ g.addV('Swimmer').property(name: name, country: country).next()
}
def insertSwim(TraversalSource g, at, event, time, result, swimmer) {
- var swim = g.addV('swim').property(at: at, event: event, time: time,
result: result).next()
+ var swim = g.addV('Swim').property(at: at, event: event, time: time,
result: result).next()
swimmer.addEdge('swam', swim)
swim
}
@@ -213,12 +217,12 @@ Now we can create the remaining swim information:
----
var km = insertSwimmer(g, 'Kylie Masse', 'π¨π¦')
var swim2 = insertSwim(g, 'Tokyo 2021', 'Heat 4', 58.17, 'First', km)
-swim2.addEdge('supercedes', swim1)
+swim2.addEdge('supersedes', swim1)
var swim3 = insertSwim(g, 'Tokyo 2021', 'Final', 57.72, 'π₯', km)
var rs = insertSwimmer(g, 'Regan Smith', 'πΊπΈ')
var swim4 = insertSwim(g, 'Tokyo 2021', 'Heat 5', 57.96, 'First', rs)
-swim4.addEdge('supercedes', swim2)
+swim4.addEdge('supersedes', swim2)
var swim5 = insertSwim(g, 'Tokyo 2021', 'Semifinal 1', 57.86, '', rs)
var swim6 = insertSwim(g, 'Tokyo 2021', 'Final', 58.05, 'π₯', rs)
var swim7 = insertSwim(g, 'Paris 2024', 'Final', 57.66, 'π₯', rs)
@@ -226,13 +230,13 @@ var swim8 = insertSwim(g, 'Paris 2024', 'Relay leg1',
57.28, 'First', rs)
var kmk = insertSwimmer(g, 'Kaylee McKeown', 'π¦πΊ')
var swim9 = insertSwim(g, 'Tokyo 2021', 'Heat 6', 57.88, 'First', kmk)
-swim9.addEdge('supercedes', swim4)
-swim5.addEdge('supercedes', swim9)
+swim9.addEdge('supersedes', swim4)
+swim5.addEdge('supersedes', swim9)
var swim10 = insertSwim(g, 'Tokyo 2021', 'Final', 57.47, 'π₯', kmk)
-swim10.addEdge('supercedes', swim5)
+swim10.addEdge('supersedes', swim5)
var swim11 = insertSwim(g, 'Paris 2024', 'Final', 57.33, 'π₯', kmk)
-swim11.addEdge('supercedes', swim10)
-swim8.addEdge('supercedes', swim11)
+swim11.addEdge('supersedes', swim10)
+swim8.addEdge('supersedes', swim11)
var kb = insertSwimmer(g, 'Katharine Berkoff', 'πΊπΈ')
var swim12 = insertSwim(g, 'Paris 2024', 'Final', 57.98, 'π₯', kb)
@@ -240,8 +244,8 @@ var swim12 = insertSwim(g, 'Paris 2024', 'Final', 57.98,
'π₯', kb)
Note that we just entered the swims where medals were won or
where olympic records were broken. We could easily have added
-more swimmers, other strokes and distances, and even other sports
-if we wanted to.
+more swimmers, other strokes and distances, relay events,
+and even other sports if we wanted to.
Let's have a look at what our graph now looks like:
@@ -249,7 +253,7 @@
image:https://raw.githubusercontent.com/paulk-asert/groovy-graphdb/main/docs/ima
We now might want to query the graph in numerous ways.
For instance, what countries had success at the Paris 2024 olympics,
-where success is defined for the purposes of this query as
+where success is defined, for the purposes of this query, as
winning a medal or breaking a record. Of course, just having
a swimmer make the olympic team is a great success - but let's
keep our example simple for now.
@@ -272,7 +276,7 @@ Similarly, we can find the olympic records set during heat
swims:
[source,groovy]
----
-var recordSetInHeat = g.V().hasLabel('swim')
+var recordSetInHeat = g.V().hasLabel('Swim')
.filter { it.get().property('event').value().startsWith('Heat') }
.values('at').toSet()
assert recordSetInHeat == ['London 2012', 'Tokyo 2021'] as Set
@@ -301,9 +305,17 @@ var recordTimesInFinals = g.V.has('event',
'Final').as('ev').out('supersedes').s
assert recordTimesInFinals == [57.47, 57.33] as Set
----
-But graph databases really excel when performing queries
-involving multiple edge traversals. Here is one looking
-at all the olympic records set in 2021 and 2024:
+Groovy happens to be very good at allowing you to add syntactic sugar
+for your own programs or existing classes. TinkerPop's special Groovy support
+is just one example of this. Your vendor could certainly supply such a feature
+for your favorite graph database (why not ask them?) but we'll look shortly at
+how you could write such syntactic sugar yourself when we explore Neo4j.
+
+Our examples so far are all interesting,
+but graph databases really excel when performing queries
+involving multiple edge traversals. Let's look
+at all the olympic records set in 2021 and 2024,
+i.e. all records set after London 2012 (`swim1` from earlier):
[source,groovy]
----
@@ -334,8 +346,8 @@ Paris 2024 Final
Paris 2024 Relay leg1
----
-As a side note, TinkerPop has a `GraphMLWriter` class which can write out our
-graph in _GraphML_, which is how the above image was created.
+NOTE: While not important for our examples, TinkerPop has a `GraphMLWriter`
class which can write out our
+graph in _GraphML_, which is how the earlier image of Graphs and Nodes was
initially generated.
== Neo4j
@@ -405,10 +417,10 @@ Node.metaClass {
----
What does this do? The propertyMissing lines catch attempts to use Groovy's
-normal property access and funnels then through the `getProperty` and
`setProperty` methods.
+normal property access and funnels then through appropriate `getProperty` and
`setProperty` methods.
The methodMissing line means any attempted method calls that we don't recognize
are intended to be relationship creation, so we funnel them through the
appropriate
-method call.
+`createRelationshipTo` method call.
Now we can use normal Groovy property access for setting the node properties.
It looks much cleaner.
@@ -442,7 +454,7 @@ swim2.result = 'First'
swim2.event = 'Heat 4'
swim2.at = 'Tokyo 2021'
km.swam(swim2)
-swim2.supercedes(swim1)
+swim2.supersedes(swim1)
swim3 = tx.createNode('Swim')
swim3.time = 57.72d
@@ -454,21 +466,16 @@ km.swam(swim3)
The code for relationships is certainly a lot cleaner too,
and it was quite a minimal amount of work to define the necessary
metaprogramming.
+
With a little bit more work, we could use static metaprogramming techniques.
This would give us better IDE completion.
-
-Another interesting topic which we won't elaborate here is stronger type
checking for graphs.
-For graph libraries which support schemas, the types for node and edge
properties can be defined,
-as can the allowable nodes applicable to any edge relationship. For such
systems, if you try to
-define a poorly-typed property, or incorrectly use a relationship, you will
receive a runtime error.
-Groovy lets us take things further, if we want, and if we are willing to do a
little more work.
-For example, if the schema is available at compile time, we could write a type
checking extension
-which would fail compilation if any invalid edge or vertex definitions were
detected.
-
+We'll have more to say about improved type checking at the end of this post.
For now though, let's continue with defining the rest of our graph.
+
We can redefine our `insertSwimmer` and `insertSwim` methods using Neo4j
implementation
calls, and then our earlier code could be used to create our graph. Now let's
-investigate what the queries look like.
+investigate what the queries look like. We'll start with querying via
+the API. and later look at using Cypher.
First, the successful countries in Paris 2024:
@@ -499,7 +506,7 @@ Now, what were the times for records broken in finals:
[source,groovy]
----
var recordTimesInFinals = swims.findAll { swim ->
- swim.event == 'Final' && swim.hasRelationship(supercedes)
+ swim.event == 'Final' && swim.hasRelationship(supersedes)
}*.time
assert recordTimesInFinals == [57.47d, 57.33d]
----
@@ -522,7 +529,7 @@ for (Path p in tx.traversalDescription()
----
Earlier versions of Neo4j also supported Gremlin, so we could have written our
queries in
-the same was as we did for TinkerPop. That technology is deprecated for Neo4j,
and instead
+the same was as we did for TinkerPop. That technology is deprecated in recent
Neo4j versions, and instead
they now offer a Cypher query language. We can use that language for all of
our previous queries
as shown here:
@@ -548,10 +555,10 @@ RETURN s1
}
----
-=== An aside on graph design
-
+.An aside on graph design
+****
This blog post is definitely, not meant to be an advanced course on graph
database
-design, but it is worth pointing out a few points.
+design, but it is worth noting a few points.
Deciding which information should be stored as node properties and which as
relationships
still requires developer judgement. For example, we could have added a Boolean
`olympicRecord`
@@ -567,7 +574,7 @@ We could write a query to find this as follows:
[source,groovy]
----
assert tx.execute('''
-MATCH (sr1:swimmer)-[:swam]->(sm1:swim {event: 'Final'}), (sm2:swim {event:
'Final'})-[:supercedes]->(sm3:swim)
+MATCH (sr1:swimmer)-[:swam]->(sm1:swim {event: 'Final'}), (sm2:swim {event:
'Final'})-[:supersedes]->(sm3:swim)
WHERE sm1.at = sm2.at AND sm1 <> sm2 AND sm1.time < sm3.time
RETURN sr1.name as name
''')*.name == ['Kylie Masse']
@@ -595,7 +602,7 @@ The resulting query becomes this:
[source,groovy]
----
assert tx.execute('''
-MATCH (sr1:swimmer)-[:swam]->(sm1:swim {event:
'Final'})-[:runnerup]->{1,2}(sm2:swim {event:
'Final'})-[:supercedes]->(sm3:swim)
+MATCH (sr1:swimmer)-[:swam]->(sm1:swim {event:
'Final'})-[:runnerup]->{1,2}(sm2:swim {event:
'Final'})-[:supersedes]->(sm3:swim)
WHERE sm1.time < sm3.time
RETURN sr1.name as name
''')*.name == ['Kylie Masse']
@@ -603,6 +610,7 @@ RETURN sr1.name as name
The _MATCH_ clause is similar in complexity, the _WHERE_ clause is much
simpler.
The query is probably faster too, but it is a tradeoff that should be weighed
up.
+****
== Apache AGE
@@ -1210,3 +1218,73 @@ gremlin.gremlin('''
println "$a $e"
}
----
+
+== Static typing
+
+Another interesting topic is improving type checking for graph database code.
+Groovy supports very dynamic styles of code through to "stronger-than-Java"
type checking.
+
+Some graph database technologies offer only a schema-free experience
+to allow your data models to _"adapt and change easily with your business"_.
+Others allow a schema to be defined with varying degrees of information.
+Groovy's dynamic capabilities make it particularly suited for writing code
+that will work easily even if you change your data model on the fly.
+However, if you prefer to add further type checking into your code, Groovy has
+options for that too.
+
+Let's recap on what schema-like capabilities our examples made use of:
+
+* Apache TinkerPop: used dynamic vertex labels and edges
+* Neo4j: used dynamic vertex labels but required edges to be defined by an enum
+* Apache AGE: although not shown in this post, defined vertex labels, edges
were dynamic
+* OrientDB: defined vertex and edge classes
+* ArcadeDB: defined vertex and edge types
+* TuGraph: defined vertex and edge labels, vertex labels had typed properties,
edge labels typed with from/to vertex labels
+* Apache HugeGraph: defined vertex and edge labels, vertex labels had typed
properties, edge labels typed with from/to vertex labels
+
+The good news about where we chose very dynamic options, we could easily add
new
+vertices and edges, e.g.:
+
+[source,groovy]
+----
+var mb = g.addV('Coach').property(name: 'Michael Bohl').next()
+mb.coaches(kmk)
+----
+
+For the examples which used schema-like capabilities, we'd need to declare the
additional
+vertex type `Coach` and edge `coaches` before we could define the new node and
edge.
+Let's explore just a few options where Groovy capabilities could make it
easier to deal
+with typing.
+
+We previously used `insertSwimmer` and `insertSwim` helper methods. We could
supply types
+for those parameters even where our underlying database technology wasn't
using them.
+That would at least capture typing errors when inserting information into our
graph.
+
+We could use a richly-typed domain using Groovy classes or records. We could
generate
+the necessary method calls to create the schema/labels and then populate the
database.
+
+Alternatively, we can leave the code in its dynamic form and make use of
Groovy's
+extensible type checking system. We could write an extension which
+fails compilation if any invalid edge or vertex definitions were detected.
+For our `coaches` example above, the previous line would pass compilation
+but if had incorrect vertices for that edge relationship, compilation would
fail,
+e.g. for the statement `swim1.coaches(mb)`, we'd get the following error:
+
+----
+[Static type checking] - Invalid edge - expected: <Coach>.coaches(<Swimmer>)
+but found: <Swim>.coaches(<Coach>)
+@ line 20, column 5.
+swim1.coaches(mb)
+^
+
+1 error
+----
+
+We won't show the code for this, it's in the GitHub repo. It is hard-coded to
+know about the `coaches` relationship. Ideally, we'd combine extensible type
checking
+with the previously mentioned richly-typed model, and we could populate both
the
+information that our type checker needs and any label/schema information our
+graph database would need.
+
+Anyway, these a just a few options Groovy gives you. Why not have fun trying
out some
+ideas yourself!
\ No newline at end of file