[DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Marko Rodriguez Mon, 15 Apr 2019 05:06:51 -0700

Hello,

I have a consolidated approach to handling data structures in TP4. I would 
appreciate any feedback you many have.


        1. Every object processed by TinkerPop has a TinkerPop-specific type.
                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath, TList, 
…
                - BENEFIT #1: A universal type system will protect us from 
language platform peculiarities (e.g. Python long vs Java long).
                - BENEFIT #2: The serialization format is constrained and 
consistent across all languages platforms. (no more coming across a 
MySpecialClass).
        2. All primitive T-type data can be directly access via get().
                - TBoolean.get() -> java.lang.Boolean | System.Boolean | ...
                - TLong.get() -> java.lang.Long | System.Int64 | ...
                - TString.get() -> java.lang.String | System.String | …
                - TList.get() -> java.lang.ArrayList | .. // can only contain 
primitives
                - TMap.get() -> java.lang.LinkedHashMap | .. // can only 
contain primitives
                - ...
        3. All complex T-types have no methods! (except those afforded by 
Object)
                - TVertex: no accessible methods.
                - TEdge: no accessible methods.
                - TRow: no accessible methods.
                - TDocument: no accessible methods.
                - TDocumentArray: no accessible methods. // a document list 
field that can contain complex objects
                - ...

REQUIREMENT #1: We need to be able to support multiple graphdbs in the same 
query.
                - e.g., read from JanusGraph and write to Neo4j.
REQUIREMENT #2: We need to make sure complex objects can not be queried 
client-side for properties/edges/etc. data.
                - e.g., vertices are universally assumed to be “detached."
REQUIREMENT #3: We no longer want to maintain a structure test suite. 
Operational semantics should be verified via Bytecode -> Processor/Structure.
                - i.e., the only way to read/write vertices is via Bytecode as 
complex T-types don’t have APIs.
REQUIREMENT #4: We should support other database data structures besides graph.
                - e.g., reading from MySQL and writing to JanusGraph.

———

Assume the following TraversalSource:

g.withStructure(JanusGraphStructure.class, config1).
  withStructure(Neo4jStructure.class, conflg2)

Now, assume the following traversal fragment:

        outE(’knows’).has(’stars’,5).inV()

 This would initially be written to Bytecode as:

        [[outE,knows],[has,stars,5],[inV]]

A decoration strategy realizes that there are two structures registered in the 
Bytecode source instructions and would rewrite the above as:

        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]

A JanusGraph strategy would rewrite this as:

        
[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]

A Neo4j strategy would rewrite this as:

        
[choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
        
A finalization strategy would rewrite this as:

        
[choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]

Now, when a TVertex gets to this CFunction, it will check its type, if its a 
JanusVertex, it goes down the JanusGraph-specific instruction branch. If the 
type is Neo4jVertex, it goes down the Neo4j-specific instruction branch.

        REQUIREMENT #1 SOLVED

The last instruction of the root bytecode can not return a complex object. If 
so, an exception is thrown. g.V() is illegal. g.V().id() is legal. Complex 
objects do not exist outside the TP4-VM. Only primitives can leave the 
VM-client barrier. If you want vertex property data (e.g.), you have to access 
it and return it within the traversal — e.g., g.V().valueMap().
        BENEFIT #1: Language variant implementations are simple. Just 
primitives.
        BENEFIT #2: The serialization specification is simple. Just primitives. 
(also, note that Bytecode is just a TList of primitives! — though TBytecode 
will exist.)
        BENEFIT #3: The concept of a “DetachedVertex” is universally assumed.

        REQUIREMENT #2 SOLVED

It is completely up to the structure provider to use structure-specific 
instructions for dealing with their particular TVertex. They will have to 
provide CFunction implementations for out, in, both, has, outE, inE, bothE, 
drop, property, value, id, label … (seems like a lot, but out/in/both could be 
one parameterized CFunction).
        BENEFIT #1: No more structure/ API and structure/ test suite.
        BENEFIT #2: The structure provider has full control of where the vertex 
data is stored (cached in memory or fetch from the db or a cut vertex or …). No 
assumptions are made by the TP4-VM.
        BENEFIT #3: The structure provider can safely assume their vertices 
will not be accessed outside the TP4-VM (outside the processor).

        REQUIREMENT #3 SOLVED

We can support TRow for relational databases. A TRow’s data is accessible via 
the instructions has, hasKey, value, property, id, ... The location of the data 
in TRow is completely up to the structure provider and its strategy analysis 
(if only ’name’ is accessed, then SELECT ’name’ FROM...). We can easily support 
TDocument for document databases. A TDocument’s data is accessible via the 
instructions has, hasKey, value, property, id, … A value() could return yet 
another TDocument (or a TDocumentArray containing TDocuments).

Supporting a new complex type is simply a function of asking: 

        “Does the TP4 VM instruction set have the requisite instruction-types 
(semantically) to manipulate this structure?"

We are no longer playing the language-specific object API game. We are playing 
the language-agnostic VM instruction game. The TP4-VM instruction set is the 
sole determiner of what complex objects can be processed. (i.e. what data 
structures can be processed without impedance mismatch).

        REQUIREMENT #4 SOLVED

———

The TP4-VM (and, in turn, Gremlin) can naturally support:

        1. Property graphs: as currently supported in TP3.
        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’) 
returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns vertices 
whose id()s are xsd:string literals.
        3. Hypergraphs: inV() can return more than one vertex.
        4. Undirected graphs: in() and out() throw exceptions. Only both() 
works.
        5. Meta-properties: value(‘name’) can return a TVertexProperty  (a 
special complex object that is structure provider specific — and that is okay!).
        6. Multi-properties: value(‘name’) can return a TPropertyArray of 
TVertexProperty objects.

This means that the same instruction can behave differently for different 
structures. This is okay as there can be property graph, RDF, hypergraph, etc. 
test suites.

Since complex objects don’t leave the TP4-VM barrier, providers can create any 
complex objects they want — they just have to have corresponding strategies to 
create provider-unique bytecode instructions (and thus, CFunctions) for those 
complex objects.

Finally. there are a few of problems to work out:
        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]” 
representation. Is that bad? Perhaps not.
        - What is the nature of a TPath? Its complex, but we want to return it.
        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”? No, 
the set of simple types should never grow!…. thus, URI => String. Is that wack?
        - Do we add g.R() and g.D() to Gremlin to type-support TRow and 
TDocument objects. g.V() would be weird :( … Hmmmm?
                - However, there are only so many data structures……. or are 
there? TMatrix, TXML, …. whoa.

Thanks for reading,
Marko.

http://rredux.com <http://rredux.com/>

[DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Reply via email to