Re: A meta model for gremlin's property graph

pieter gmail Sat, 15 Jan 2022 09:49:44 -0800

Hi,

Here are some thoughts on your response.


> which parts of the approach you describe below were influenced by OMG

The primary inspiration from UML is the insight that a language can be
self describing.  It is of course inevitable in the real world as we
can not tolerate infinite regression with regards to every level
needing yet another meta level to describe it.
The is precisely an attempt at gremlin describing itself without
recourse to any other language.

> +1 to using or drawing upon standards where we can. 

To be clear I am not using any OMG standard as such. If we were to do
that we would define the property graph model using MOF (meta object
facility) or its counter part EMF. While this is entirely possible it
is not the approach taken here. Here the attempt is to bootstrap the
property graph model entirely and only with gremlin.

> The problem right now is that Gremlin's declarative semantics aren't
> very clear, and it is a relatively complex language.

This is not an attempt at a specification of the gremlin language. It
is only an attempt at formally specifying the implicit property graph
model assumed by the gremlin language. My understanding is that the
gremlin language will be formally defined by the antlr grammar
accompanied with documentation in English.

> I like the term "schema".

+1

> I agree, and I think there is value in going one step further to
> create a general purpose data model for defining data models, with
> property graphs as a special case.

Here I do not agree. While there certainly is value in meta meta models
I do not think actually designing a new one belongs in TinkerPop.
TinkerPop is about the gremlin language and the property graph model,
not about meta meta models. The job of creating deeper more abstract
models with all that it entails is in my opinion a huge task that has
little to do TinkerPop, gremlin and its property graph model.

>  the classic graph ("data graph") has elements "Marko", "Josh",
> "ripple" etc. each of which is a value together with a type and a
> name

Here it is the same critique. There is no need to say that a vertex
together with its label is in fact a type with a name. Type is not a
notion in gremlin nor a notion in our meta model so its not part of our
language.

> Cool, except that I would banish types like Date and Time

I have no strong intuitions about this art/science. Perhaps the meta
model should be extended to provide some support for non primitive data
types.

> > int8, ...

I was actually hoping to avoid some arbitrary attempt at defining a
long list of possible primitives. I looked on the internet but seems
there is no standard body out there for this with every language and
database defining its own types. Perhaps the long list is the only
solution?

> Or the other way around: we define a core model as its own thing
> using a well-defined, controlled vocabulary, then map it into
> Gremlin.

Same critique as above. Letting in another language means gremlin does
not bootstrap itself.

> I don't see your approach of embedding model definitions and
> constraints natively in Gremlin as being at odds with having a formal
> data model.

Afraid I do see as being at odds with one another. Describing gremlin
using another language, be it MOF/EMF/category theory is a very big
difference to it being self describing. If we decide against gremlin
self describing then we abort this attempt, no point in hacking it.

For what its worth this is a bit of a proof of concept. To see if
gremlin can meaningfully self describe. It has done so for the last 10
years.

Perhaps we should, however, before discussing the merits of this
approach or another, first decide what we are trying to achieve in the
first place.

Here goes my understanding of what we are trying to achieve.

1: A property graph meta model. To describe exactly what kind of data
structure the gremlin language operates on.
2: Gremlin grammar together with the documentation specifies gremlin
the language fully.
3: Extend the gremlin grammar to specify schema create/edit/delete
functionality.
4: Extend the grammar to query the schema. (This can be plain gremlin,
just operating at the schema level)
5: A language agnostic specification of how to interact with a remote
gremlin enabled system. i.e. similar to the jdbc specification only
without reference to any particular language.

As an aside, breaking user space should not even be considered. i.e.
99% backward compatibility should be guaranteed at all times.

Thanks
Pieter




On Tue, 2022-01-11 at 10:47 -0800, Joshua Shinavier wrote:
> Hey Pieter,
> 
> Good to see some more motion on this front. Responses inline.
> 
> 
> On Sun, Jan 9, 2022 at 4:28 AM pieter gmail <[email protected]>
> wrote:
> > Hi,
> > 
> > I have done some work on defining a meta model for Gremlin's
> > property graph. I am using the approach used in the modelling
> > world, in particular as done by the OMG group when defining their
> > various meta models and specifications.
> > 
> 
> 
> +1 to using or drawing upon standards where we can. For those of us
> (including me) who have not worked with OMG standards other than
> occasionally bumping into UML, which parts of the approach you
> describe below were influenced by OMG?
> 
>  
> > However where OMG uses a subset of the UML to define their meta
> > models I suggest we use Gremlin. After all Gremlin is the language
> > we use to describe the world and the property graph meta model can
> > also be described in Gremlin.
> > 
> 
> 
> I agree, as long as these descriptions do not admit "arbitrary
> Gremlin". The problem right now is that Gremlin's declarative
> semantics aren't very clear, and it is a relatively complex language.
> I totally agree that you could define a DSL for defining models which
> could be embedded in Gremlin; you could even define the DSL in terms
> of itself.
> 
>  
> > I propose that we have 3 levels of modelling. Each of which can
> > itself be specified in gremlin.
> > 
> > 1: The property graph meta model.
> > 
> 
> 
> +1
> 
>  
> > 2: The model.
> > 
> 
> 
> I like the term "schema".
> 
>  
> > 3: The graph representing the actual data.
> > 
> 
> 
> +1. Not only is the graph a "model", but depending on how you define
> the modeling DSL, you can also see the other two models as "graphs",
> with types as elements.
> 
>  
> > 1) The property graph meta model describes the nature of the
> > property graph itself. i.e. that property graphs have vertices,
> > edges and properties.
> > 
> 
> 
> I agree, and I think there is value in going one step further to
> create a general purpose data model for defining data models, with
> property graphs as a special case.
> 
>  
> > 2) The model is an instance of the meta model. It describes the
> > schema of a particular graph. i.e. for TinkerPop's modern graph
> > this would be 'person', 'software', 'created' and 'knows' and the
> > various properties 'weight', 'age', 'name' and 'lang' properties.
> > 
> 
> 
> +1
>  
> 
> > 3) The final level is an instance of the model. It is the actual
> > graph itself. i.e. for TinkerPop's modern graph it is 'Marko',
> > 'Josh', 'java' ...
> > 
> 
> 
> Yes. So to elaborate on what I said above about models and graphs,
> let's say we add a schema to the TinkerPop classic graph. The classic
> graph is an instance of the schema, and the schema is an instance of
> a property graph schema. Your three models are three graphs:
> 1) the classic graph ("data graph") has elements "Marko", "Josh",
> "ripple" etc. each of which is a value together with a type and a
> name (id). The type of Marko is "Person" (a named type) and the type
> of ripple is "Project" etc. The value of Marko is the record {"name":
> "marko", "age": 29} while the value of ripple is {"name": "ripple",
> "lang": "java"}.
> 2) the schema of the classic graph ("schema graph") has elements
> "Person", "Project", "knows", and "created". These again are values
> together with types and ids. E.g. the type of "Person" is something
> like {"name": string, "age": int32}, i.e. a record type.
> 3) the schema of the schema of the classic graph -- i.e. the core
> model or what you called the meta model -- is again a graph with
> elements like "Type", "Element", etc. Type expressions in the schema
> of the classic graph are values in the core model. The core model is
> its own schema.
> 
> Decide for yourself if the above makes sense to you, but this is how
> I think of the TinkerPop modeling layer cake these days -- as chained
> models in which the schema of one graph is the data of the next,
> usually arriving at a fixpoint -- the core -- within two steps.
> 
> 
> 
> > 1: Property Graph Meta Model
> > 
> >     public static Graph gremlinMetaModel() {
> >         enum GremlinDataType {
> >             STRING,
> >             INTEGER,
> >             DOUBLE,
> >             DATE,
> >             TIME
> >             //...
> >         }
> > 
> 
> 
> Cool, except that I would banish types like Date and Time from the
> core model. Drawing the line between primitive types and derived
> types is more art than science, but there is enough variation in what
> developers want out of dates/times that I put them on the other side
> of the fence. It also makes implementations easier if you have as few
> baked-in types as possible. On the other hand, I suggest adding many
> more numeric types, e.g. for integers:
> > - bigint
> > - int8
> > - int16
> > - int32
> > - int64
> > - uint8
> > - uint16
> > - uint32
> > - uint64
> 
> and for floating-point numbers:
> > - name: bigfloat
> > - name: float32
> > - name: float64
> 
> 
> > [snip metamodel definition]
> > 
> > 
> > 
> > This can be visualized as,
> > ...
> > 
> 
>  
> 
> I'm not sure if I'm reading this correctly, and I can't see the
> figure yet, but I understand that you are defining the metamodel as a
> graph. Cool.
> 
> 
>  
> > 
> > Notes: 
> > 1) GremlinDataType is an enumeration of named data types that
> > Gremlin supports. All gremlin data types are assumed to be atomic
> > and its life cycle fully owned by its containing parent. How it is
> > persisted on disc or transported over the wire is not a concern for
> > the meta model.
> > 
> 
> 
> Agree with most. Primitive/literal types are atomic, but you should
> be also able to define complex data types and bind them to names, and
> that is essentially what you are doing in the above.
> 
>  
> > 2) Gremlin's semantics is to weak to fully specify a valid meta
> > model. Accompanying the meta model we need a list of constraints
> > specified as gremlin queries to augment the semantics of the meta
> > model. These constraints/queries will be able to validate any
> > gremlin specified model for correctness.
> > 
> 
> 
> Or the other way around: we define a core model as its own thing
> using a well-defined, controlled vocabulary, then map it into
> Gremlin.
> 
> 
> > 3) It is trivial to extend the meta model. e.g. To specify
> > something like index support just add an 'Index' vertex and an edge
> > from 'VertexLabel' to it.
> > 
> 
> 
> I would say that you're extending a second-order model in that case.
> The core model / metamodel should be constant, but you can define
> additional models on top of it.
> 
>  
> > Property graph meta model constraints,
> > 
> > [...]
> > 
> 
> 
> Cool (though here, too, I would define constraints in a limited DSL,
> and map them into Gremlin).
> 
>  
> > 2: The model
> > 
> > What follows is an example of TinkerPop's 'modern' graph specified
> > as an instance of the above property graph meta model.
> > [...]
> > 
> 
> 
> Cool.
> 
>  
> > 
> > There are lots of details to complete, but first we need to see if
> > there is any appetite for a modelling approach as I realize there
> > is some academic abstract algebra work happening elsewhere.
> > 
> 
> 
> There is, but I don't see your approach of embedding model
> definitions and constraints natively in Gremlin as being at odds with
> having a formal data model. Have cake, eat it too.
> 
>  
> > It seems to me to have a lower barrier to entry for the community
> > to partake in the discussion of what constitutes a property graph
> > model.
> > 
> 
> 
> That's important. In my opinion, having the formal model defined up
> front gives you more power and flexibility for graph validation,
> transformations, and inference, but having the abstract model, you
> can also build developer-friendly DSLs on top of it.
> 
> 
> > Let me know if there are questions or criticisms.
> > 
> 
> 
> One of the nice things about your proposal is that it doesn't
> increase Java tech debt; you're suggesting defining models using
> Gremlin syntax, which is language variant -neutral. +1 to that.
> 
>  Josh
> 
> 
>

Re: A meta model for gremlin's property graph

Reply via email to