Re: What makes 'graph traversals' and 'relational joins' the same?

Marko Rodriguez Mon, 22 Apr 2019 13:45:54 -0700

Hey Josh,

I’m digging what you are saying, but the pictures didn’t come through for me ? 
… Can you provide them again (or if dev@ is filtering them, can you give me 
URLs to them)?


Thanks,
Marko.


> On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <j...@fortytwo.net> wrote:
> 
> On the subject of "reified joins", maybe be a picture will be worth a few 
> words. As I said in the thread 
> <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on 
> property graph standardization, if you think of vertex labels, edge labels, 
> and property keys as types, each with projections to two other types, there 
> is a nice analogy with relations of two columns, and this analogy can be 
> easily extended to hyper-edges. Here is what the schema of the TinkerPop 
> classic graph looks like if you make each type (e.g. Person, Project, knows, 
> name) into a relation:
> 
> 
> 
> I have made the vertex types salmon-colored, the edge types yellow, the 
> property types green, and the data types blue. The "o" and "I" columns 
> represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. 
> property value type of String) of each relation. More than two arrows from a 
> column represent a coproduct, e.g. the out-type of "name" is Person OR 
> Project. Now you can think of out() and in() as joins of two tables on a 
> primary and foreign key.
> 
> We are not limited to "out" and "in", however. Here is the ternary 
> relationship (hyper-edge) from hyper-edge slide 
> <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49>
>  of my Graph Day preso, which has three columns/roles/projections:
> 
> 
> 
> I have drawn Says in light blue to indicate that it is a generalized element; 
> it has projections other than "out" and "in". Now the line between relations 
> and edges begins to blur. E.g. in the following, is PlaceEvent a vertex or a 
> property?
> 
> 
> 
> With the right type system, we can just speak of graph elements, and use 
> "vertex", "edge", "property" when it is convenient. In the relational model, 
> they are relations. If you materialize them in a relational database, they 
> are rows. In any case, you need two basic graph traversal operations:
> project() -- forward traversal of the arrows in the above diagrams. Takes you 
> from an element to a component like in-vertex.
> select() -- reverse traversal of the arrows. Allows you to answer questions 
> like "in which Trips is John Doe the rider?"
> 
> Josh
> 
> 
> On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okramma...@gmail.com 
> <mailto:okramma...@gmail.com>> wrote:
> Hello,
> 
> I agree with everything you say. Here is my question:
> 
>         Relational database — join: Table x Table x equality function -> Table
>         Graph database — traverser: Vertex x edge label -> Vertex
> 
> I want a single function that does both. The only think was to represent 
> traverser() in terms of join():
> 
>         Graph database — traverser: Vertices x Vertex x equality function -> 
> Vertices
> 
> For example, 
> 
> V().out(‘address’)
> 
>         ==>
> 
> g.join(V().hasLabel(‘person’).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
> 
> That is, join the vertices with themselves based on some predicate to go from 
> vertices to vertices.
> 
> However, I would like instead to transform the relational database join() 
> concept into a traverser() concept. Kuppitz and I were talking the other day 
> about a link() type operator that says: “try and link to this thing in some 
> specified way.” .. ?? The problem we ran into is again, “link it to what?”
> 
>         - in graph, the ‘to what’ is hardcoded so you don’t need to specify 
> anything.
>         - in rdbms, the ’to what’ is some other specified table.
> 
> So what does the link() operator look like?
> 
> ——
> 
> Some other random thoughts….
> 
> Relational databases join on the table (the whole collection)
> Graph databases traverser on the vertex (an element of the whole collection)
> 
> We can make a relational database join on single row (by providing a filter 
> to a particular primary key). This is the same as a table with one row. 
> Likewise, for graph in the join() context above:
> 
> V(1).out(‘address’) 
> 
>         ==>
> 
> g.join(V(1).as(‘a’)
>        V().hasLabel(‘addresses’).as(‘b’)).
>          by(‘name’).select(?address vertex?)
> 
> More thoughts please….
> 
> Marko.
> 
> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
> <http://rredux.com/>>
> 
> 
> 
> 
> > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com 
> > <mailto:pieter.mar...@gmail.com>> wrote:
> > 
> > Hi,
> > The way I saw it is that the big difference is that graph's have
> > reified joins. This is both a blessing and a curse.
> > A blessing because its much easier (less text to type, less mistakes,
> > clearer semantics...) to traverse an edge than to construct a manual
> > join.A curse because there are almost always far more ways to traverse
> > a data set than just by the edges some architect might have considered
> > when creating the data set. Often the architect is not the domain
> > expert and the edges are a hardcoded layout of the dataset, which
> > almost certainly won't survive the real world's demands. In graphs, if
> > their are no edges then the data is not reachable, except via indexed
> > lookups. This is the standard engineering problem of database design,
> > but it is important and useful that data can be traversed, joined,
> > without having reified edges.
> > In Sqlg at least, but I suspect it generalizes, I want to create the
> > notion of a "virtual edge". Which in meta data describes the join and
> > then the standard to(direction, "virtualEdgeName") will work.
> > In a way this is precisely to keep the graphy nature of gremlin, i.e.
> > traversing edges, and avoid using the manual join syntax you described.
> > CheersPieter
> > 
> > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote:
> >> Hi,
> >> *** This is mainly for Kuppitz, but if others care. 
> >> Was thinking last night about relational data and Gremlin. The T()
> >> step returns all the tables in the withStructure() RDBMS database.
> >> Tables are ‘complex values’ so they can't leave the VM (only a simple
> >> ‘toString’).
> >> Below is a fake Gremlin session. (and these are just ideas…) tables
> >> -> a ListLike of rows        rows -> a MapLike of primitives
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘people’)==>t[people]gremlin>
> >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml
> >> in>
> >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz,
> >> age:10}==>{name:josh,age:35}gremlin>
> >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g
> >> remlin>
> >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos
> >> h
> >> Makes sense. Nice that values() and has() generally apply to all
> >> ListLike and MapLike structures. Also, note how asMap() is the
> >> valueMap() of TP4, but generalizes to anything that is MapLike so it
> >> can be turned into a primitive form as a data-rich result from the
> >> VM.
> >> gremlin> g.T()==>t[people]==>t[addresses]gremlin>
> >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name
> >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(se
> >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))).           
> >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe
> >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name 
> >> <http://a.name/>:
> >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’). // shorthand for equijoin on name
> >> column/key           values().asMap()==>{a.name:marko,a.age:29,b.name 
> >> <http://b.name/>
> >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci 
> >> <http://b.ci/>
> >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr
> >> emlin>
> >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)).             by(’n
> >> ame’)==>t[people<-name->addresses]  // without asMap(), just the
> >> complex value ‘toString'gremlin>
> >> And of course, all of this is strategized into a SQL call so its
> >> joins aren’t necessarily computed using TP4-VM resources.
> >> Anywho — what I hope to realize is the relationship between “links”
> >> (graph) and “joins” (tables). How can we make (bytecode-wise at
> >> least) RDBMS join operations and graph traversal operations ‘the
> >> same.’?
> >>      Singleton: Integer, String, Float, Double, etc. Collection:
> >> List, Map (Vertex, Table, Document)  Linkable: Vertex, Table
> >> Vertices and Tables can be “linked.” Unlike Collections, they don’t
> >> maintain a “parent/child” relationship with the objects they
> >> reference. What does this mean……….?
> >> Take care,Marko.
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ 
> >> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> 
> >> <http://rredux.com/ <http://rredux.com/>>>

Re: What makes 'graph traversals' and 'relational joins' the same?

Reply via email to