Hey Josh, I’m digging what you are saying, but the pictures didn’t come through for me ? … Can you provide them again (or if dev@ is filtering them, can you give me URLs to them)?
Thanks, Marko. > On Apr 21, 2019, at 12:58 PM, Joshua Shinavier <j...@fortytwo.net> wrote: > > On the subject of "reified joins", maybe be a picture will be worth a few > words. As I said in the thread > <https://groups.google.com/d/msg/gremlin-users/_s_DuKW90gc/Xhp5HMfjAQAJ> on > property graph standardization, if you think of vertex labels, edge labels, > and property keys as types, each with projections to two other types, there > is a nice analogy with relations of two columns, and this analogy can be > easily extended to hyper-edges. Here is what the schema of the TinkerPop > classic graph looks like if you make each type (e.g. Person, Project, knows, > name) into a relation: > > > > I have made the vertex types salmon-colored, the edge types yellow, the > property types green, and the data types blue. The "o" and "I" columns > represent the out-type (e.g. out-vertex type of Person) and in-type (e.g. > property value type of String) of each relation. More than two arrows from a > column represent a coproduct, e.g. the out-type of "name" is Person OR > Project. Now you can think of out() and in() as joins of two tables on a > primary and foreign key. > > We are not limited to "out" and "in", however. Here is the ternary > relationship (hyper-edge) from hyper-edge slide > <https://www.slideshare.net/joshsh/a-graph-is-a-graph-is-a-graph-equivalence-transformation-and-composition-of-graph-data-models-129403012/49> > of my Graph Day preso, which has three columns/roles/projections: > > > > I have drawn Says in light blue to indicate that it is a generalized element; > it has projections other than "out" and "in". Now the line between relations > and edges begins to blur. E.g. in the following, is PlaceEvent a vertex or a > property? > > > > With the right type system, we can just speak of graph elements, and use > "vertex", "edge", "property" when it is convenient. In the relational model, > they are relations. If you materialize them in a relational database, they > are rows. In any case, you need two basic graph traversal operations: > project() -- forward traversal of the arrows in the above diagrams. Takes you > from an element to a component like in-vertex. > select() -- reverse traversal of the arrows. Allows you to answer questions > like "in which Trips is John Doe the rider?" > > Josh > > > On Fri, Apr 19, 2019 at 10:03 AM Marko Rodriguez <okramma...@gmail.com > <mailto:okramma...@gmail.com>> wrote: > Hello, > > I agree with everything you say. Here is my question: > > Relational database — join: Table x Table x equality function -> Table > Graph database — traverser: Vertex x edge label -> Vertex > > I want a single function that does both. The only think was to represent > traverser() in terms of join(): > > Graph database — traverser: Vertices x Vertex x equality function -> > Vertices > > For example, > > V().out(‘address’) > > ==> > > g.join(V().hasLabel(‘person’).as(‘a’) > V().hasLabel(‘addresses’).as(‘b’)). > by(‘name’).select(?address vertex?) > > That is, join the vertices with themselves based on some predicate to go from > vertices to vertices. > > However, I would like instead to transform the relational database join() > concept into a traverser() concept. Kuppitz and I were talking the other day > about a link() type operator that says: “try and link to this thing in some > specified way.” .. ?? The problem we ran into is again, “link it to what?” > > - in graph, the ‘to what’ is hardcoded so you don’t need to specify > anything. > - in rdbms, the ’to what’ is some other specified table. > > So what does the link() operator look like? > > —— > > Some other random thoughts…. > > Relational databases join on the table (the whole collection) > Graph databases traverser on the vertex (an element of the whole collection) > > We can make a relational database join on single row (by providing a filter > to a particular primary key). This is the same as a table with one row. > Likewise, for graph in the join() context above: > > V(1).out(‘address’) > > ==> > > g.join(V(1).as(‘a’) > V().hasLabel(‘addresses’).as(‘b’)). > by(‘name’).select(?address vertex?) > > More thoughts please…. > > Marko. > > http://rredux.com <http://rredux.com/> <http://rredux.com/ > <http://rredux.com/>> > > > > > > On Apr 19, 2019, at 4:20 AM, pieter martin <pieter.mar...@gmail.com > > <mailto:pieter.mar...@gmail.com>> wrote: > > > > Hi, > > The way I saw it is that the big difference is that graph's have > > reified joins. This is both a blessing and a curse. > > A blessing because its much easier (less text to type, less mistakes, > > clearer semantics...) to traverse an edge than to construct a manual > > join.A curse because there are almost always far more ways to traverse > > a data set than just by the edges some architect might have considered > > when creating the data set. Often the architect is not the domain > > expert and the edges are a hardcoded layout of the dataset, which > > almost certainly won't survive the real world's demands. In graphs, if > > their are no edges then the data is not reachable, except via indexed > > lookups. This is the standard engineering problem of database design, > > but it is important and useful that data can be traversed, joined, > > without having reified edges. > > In Sqlg at least, but I suspect it generalizes, I want to create the > > notion of a "virtual edge". Which in meta data describes the join and > > then the standard to(direction, "virtualEdgeName") will work. > > In a way this is precisely to keep the graphy nature of gremlin, i.e. > > traversing edges, and avoid using the manual join syntax you described. > > CheersPieter > > > > On Thu, 2019-04-18 at 14:15 -0600, Marko Rodriguez wrote: > >> Hi, > >> *** This is mainly for Kuppitz, but if others care. > >> Was thinking last night about relational data and Gremlin. The T() > >> step returns all the tables in the withStructure() RDBMS database. > >> Tables are ‘complex values’ so they can't leave the VM (only a simple > >> ‘toString’). > >> Below is a fake Gremlin session. (and these are just ideas…) tables > >> -> a ListLike of rows rows -> a MapLike of primitives > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin> > >> g.T(‘people’)==>t[people]gremlin> > >> g.T(‘people’).values()==>r[people:1]==>r[people:2]==>r[people:3]greml > >> in> > >> g.T(‘people’).values().asMap()==>{name:marko,age:29}==>{name:kuppitz, > >> age:10}==>{name:josh,age:35}gremlin> > >> g.T(‘people’).values().has(‘age’,gt(20))==>r[people:1]==>r[people:3]g > >> remlin> > >> g.T(‘people’).values().has(‘age’,gt(20)).values(‘name’)==>marko==>jos > >> h > >> Makes sense. Nice that values() and has() generally apply to all > >> ListLike and MapLike structures. Also, note how asMap() is the > >> valueMap() of TP4, but generalizes to anything that is MapLike so it > >> can be turned into a primitive form as a data-rich result from the > >> VM. > >> gremlin> g.T()==>t[people]==>t[addresses]gremlin> > >> g.T(‘addresses’).values().asMap()==>{name:marko,city:santafe}==>{name > >> :kuppitz,city:tucson}==>{name:josh,city:desertisland}gremlin> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). by(se > >> lect(‘a’).value(’name’).is(eq(select(‘b’).value(’name’))). > >> values().asMap()==>{a.name:marko,a.age:29,b.name:marko,b.city:santafe > >> }==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.city:tucson}==>{a.name > >> <http://a.name/>: > >> josh,a.age:35,b.name:josh,b.city:desertisland}gremlin> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). by(’n > >> ame’). // shorthand for equijoin on name > >> column/key values().asMap()==>{a.name:marko,a.age:29,b.name > >> <http://b.name/> > >> :marko,b.city:santafe}==>{a.name:kuppitz,a.age:10,b.name:kuppitz,b.ci > >> <http://b.ci/> > >> ty:tucson}==>{a.name:josh,a.age:35,b.name:josh,b.city:desertisland}gr > >> emlin> > >> g.join(T(‘people’).as(‘a’),T(‘addresses’).as(‘b’)). by(’n > >> ame’)==>t[people<-name->addresses] // without asMap(), just the > >> complex value ‘toString'gremlin> > >> And of course, all of this is strategized into a SQL call so its > >> joins aren’t necessarily computed using TP4-VM resources. > >> Anywho — what I hope to realize is the relationship between “links” > >> (graph) and “joins” (tables). How can we make (bytecode-wise at > >> least) RDBMS join operations and graph traversal operations ‘the > >> same.’? > >> Singleton: Integer, String, Float, Double, etc. Collection: > >> List, Map (Vertex, Table, Document) Linkable: Vertex, Table > >> Vertices and Tables can be “linked.” Unlike Collections, they don’t > >> maintain a “parent/child” relationship with the objects they > >> reference. What does this mean……….? > >> Take care,Marko. > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ > >> <http://rredux.com/>> <http://rredux.com/ <http://rredux.com/> > >> <http://rredux.com/ <http://rredux.com/>>>