Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Stephen Mallette Tue, 16 Apr 2019 09:54:13 -0700

>
> > I'd also wonder about how we treat subgraph() and tree()? could those be
> a
> > List<TPath> somehow??
>
> Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you
> don’t get back a graph, its stored in:
>
> g.withProcessor(TinkerGraphStructure.class, config1)
>
> That is, the subgraph is written to one of the registered structures. You
> can then query it like any other registered structure. Remember, in TP4, we
> will support an arbitrary number of structures associated with a Bytecode
> source.
>


I just thought of something interesting - if we can subgraph() into a
TinkerGraph that way, then the opposite is true as well, right? like, you
could pull a subgraph(), do some mutations to it locally, then later write
some Gremlin to merge that subgraph back to its parent as a single
transaction. i suppose the nature of a "single transaction" would be
specific to each graph provider, but still neat to think about.

On Mon, Apr 15, 2019 at 2:19 PM Marko Rodriguez <[email protected]>
wrote:

> Hello Stephen,
>
> > I'd also wonder about how we treat subgraph() and tree()? could those be
> a
> > List<TPath> somehow??
>
> Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you
> don’t get back a graph, its stored in:
>
> g.withProcessor(TinkerGraphStructure.class, config1)
>
> That is, the subgraph is written to one of the registered structures. You
> can then query it like any other registered structure. Remember, in TP4, we
> will support an arbitrary number of structures associated with a Bytecode
> source.
>
> > isn't a URI a complex type? that list is expected to grow? maybe all
> > complex types have simple type representations?
>
> The problem with every complex type having a simple type representation is
> that the serializer will have to know about complex types (as objects).
> This is just more code for Python, JavaScript, Java, etc. to maintain. If
> the serialization format is ONLY primitives, and primitives come from a
> static set of ~10 types, then writing, testing, and maintaining serializers
> in other languages will be trivial.
>
>         Bytecode in [a nested list of primitives]
>         Traversers out [a collection of coefficient wrapped primitives]
>
> Everything communicated over the wire is primitive! Basic. (TTraverser
> will have to be primitive, where get() returns a coefficient [bulk] and
> primitive [object] pair).
>
> > sorry, if some of these questions/ideas are a bit half-cocked, but i read
> > this really fast and won't be at my laptop for the rest of the day and
> > wanted to get some thoughts out. i'm really really interested in seeing
> > this aspect of TP done "right"….
>
> No worries. Thanks for replying.
>
> Some random ideas I was having.
>
>         - TXML: Assume an XML database. out() would be the children tags.
> value() would be the tag attribute value. label() would be the tag type. In
> other words, there is a clean mapping from the instructions to XML.
>         - TMatrix: Assume a database of nxm matricies. math() instruction
> will be augmented to support matrix multiplication. A matrix is a table
> with rows and columns. We would need some nice instructions for that.
>         - TJPEG: Assume a database of graphics. Does our instruction set
> have instructions that are useful for manipulating images? Probably need
> row/column type instructions like TMatrix.
>         - TObject: Assume an object database. value() are primitive
> fields. out() is object fields. id() is unique object identifier. label()
> is object class. has() is a primitive field filter.
>         - TTimeSeries: ? I don’t know anything about time series
> databases, but the question remains…do our instructions make sense for this
> data structure?
>         - https://en.wikipedia.org/wiki/List_of_data_structures <
> https://en.wikipedia.org/wiki/List_of_data_structures>
>
> The point being. I’m trying to think of odd ball data structures and then
> trying to see if the TP4 instruction set is sufficiently general to
> encompass operations used by those structures.
>
> The beautiful thing is that providers can create as many complex types as
> they want. These types are always contained with the TP4-VM and thus
> require no changes to the serialization format and respective objects in
> the deserializing language. Imagine, some XML database out there is using
> the TP4-VM, with the XPath language compiling to TP4 bytecode, and is
> processing their XML documents in real-time (Pipes/Rx), near-time
> (Flink/Akka), or batch-time (Spark/Hadoop). The TP4-VM has a life beyond
> graph! What a wonderful asset to the entire space of data processing!
>
> …now think of the RDF community using the TP4-VM. SPARQL will be
> W3C-compilant and can execute in real-time, near-time, batch-time, etc.
> What a useful technology to adopt for your RDF triple-store. I could see
> Stardog using TP4 for their batch processing. I could see Jena or OpenRDF
> importing TP4 to provide different SPARQL execution engines to their
> triple-store providers.
>
> The TP4 virtual machine may just turn out to be a technological
> masterpiece.
>
> Marko.
>
> http://rredux.com
>
>
>
>
>
>
>
> >
> > On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez <[email protected]
> <mailto:[email protected]>>
> > wrote:
> >
> >> Hello,
> >>
> >> I have a consolidated approach to handling data structures in TP4. I
> would
> >> appreciate any feedback you many have.
> >>
> >>        1. Every object processed by TinkerPop has a TinkerPop-specific
> >> type.
> >>                - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath,
> >> TList, …
> >>                - BENEFIT #1: A universal type system will protect us
> from
> >> language platform peculiarities (e.g. Python long vs Java long).
> >>                - BENEFIT #2: The serialization format is constrained and
> >> consistent across all languages platforms. (no more coming across a
> >> MySpecialClass).
> >>        2. All primitive T-type data can be directly access via get().
> >>                - TBoolean.get() -> java.lang.Boolean | System.Boolean |
> >> ...
> >>                - TLong.get() -> java.lang.Long | System.Int64 | ...
> >>                - TString.get() -> java.lang.String | System.String | …
> >>                - TList.get() -> java.lang.ArrayList | .. // can only
> >> contain primitives
> >>                - TMap.get() -> java.lang.LinkedHashMap | .. // can only
> >> contain primitives
> >>                - ...
> >>        3. All complex T-types have no methods! (except those afforded by
> >> Object)
> >>                - TVertex: no accessible methods.
> >>                - TEdge: no accessible methods.
> >>                - TRow: no accessible methods.
> >>                - TDocument: no accessible methods.
> >>                - TDocumentArray: no accessible methods. // a document
> >> list field that can contain complex objects
> >>                - ...
> >>
> >> REQUIREMENT #1: We need to be able to support multiple graphdbs in the
> >> same query.
> >>                - e.g., read from JanusGraph and write to Neo4j.
> >> REQUIREMENT #2: We need to make sure complex objects can not be queried
> >> client-side for properties/edges/etc. data.
> >>                - e.g., vertices are universally assumed to be
> “detached."
> >> REQUIREMENT #3: We no longer want to maintain a structure test suite.
> >> Operational semantics should be verified via Bytecode ->
> >> Processor/Structure.
> >>                - i.e., the only way to read/write vertices is via
> >> Bytecode as complex T-types don’t have APIs.
> >> REQUIREMENT #4: We should support other database data structures besides
> >> graph.
> >>                - e.g., reading from MySQL and writing to JanusGraph.
> >>
> >> ———
> >>
> >> Assume the following TraversalSource:
> >>
> >> g.withStructure(JanusGraphStructure.class, config1).
> >>  withStructure(Neo4jStructure.class, conflg2)
> >>
> >> Now, assume the following traversal fragment:
> >>
> >>        outE(’knows’).has(’stars’,5).inV()
> >>
> >> This would initially be written to Bytecode as:
> >>
> >>        [[outE,knows],[has,stars,5],[inV]]
> >>
> >> A decoration strategy realizes that there are two structures registered
> in
> >> the Bytecode source instructions and would rewrite the above as:
> >>
> >>        [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]]
> >>
> >> A JanusGraph strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]]
> >>
> >> A Neo4j strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
> >>
> >> A finalization strategy would rewrite this as:
> >>
> >>
> >>
> [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]]
> >>
> >> Now, when a TVertex gets to this CFunction, it will check its type, if
> its
> >> a JanusVertex, it goes down the JanusGraph-specific instruction branch.
> If
> >> the type is Neo4jVertex, it goes down the Neo4j-specific instruction
> branch.
> >>
> >>        REQUIREMENT #1 SOLVED
> >>
> >> The last instruction of the root bytecode can not return a complex
> object.
> >> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal.
> >> Complex objects do not exist outside the TP4-VM. Only primitives can
> leave
> >> the VM-client barrier. If you want vertex property data (e.g.), you
> have to
> >> access it and return it within the traversal — e.g., g.V().valueMap().
> >>        BENEFIT #1: Language variant implementations are simple. Just
> >> primitives.
> >>        BENEFIT #2: The serialization specification is simple. Just
> >> primitives. (also, note that Bytecode is just a TList of primitives! —
> >> though TBytecode will exist.)
> >>        BENEFIT #3: The concept of a “DetachedVertex” is universally
> >> assumed.
> >>
> >>        REQUIREMENT #2 SOLVED
> >>
> >> It is completely up to the structure provider to use structure-specific
> >> instructions for dealing with their particular TVertex. They will have
> to
> >> provide CFunction implementations for out, in, both, has, outE, inE,
> bothE,
> >> drop, property, value, id, label … (seems like a lot, but out/in/both
> could
> >> be one parameterized CFunction).
> >>        BENEFIT #1: No more structure/ API and structure/ test suite.
> >>        BENEFIT #2: The structure provider has full control of where the
> >> vertex data is stored (cached in memory or fetch from the db or a cut
> >> vertex or …). No assumptions are made by the TP4-VM.
> >>        BENEFIT #3: The structure provider can safely assume their
> >> vertices will not be accessed outside the TP4-VM (outside the
> processor).
> >>
> >>        REQUIREMENT #3 SOLVED
> >>
> >> We can support TRow for relational databases. A TRow’s data is
> accessible
> >> via the instructions has, hasKey, value, property, id, ... The location
> of
> >> the data in TRow is completely up to the structure provider and its
> >> strategy analysis (if only ’name’ is accessed, then SELECT ’name’
> FROM...).
> >> We can easily support TDocument for document databases. A TDocument’s
> data
> >> is accessible via the instructions has, hasKey, value, property, id, … A
> >> value() could return yet another TDocument (or a TDocumentArray
> containing
> >> TDocuments).
> >>
> >> Supporting a new complex type is simply a function of asking:
> >>
> >>        “Does the TP4 VM instruction set have the requisite
> >> instruction-types (semantically) to manipulate this structure?"
> >>
> >> We are no longer playing the language-specific object API game. We are
> >> playing the language-agnostic VM instruction game. The TP4-VM
> instruction
> >> set is the sole determiner of what complex objects can be processed.
> (i.e.
> >> what data structures can be processed without impedance mismatch).
> >>
> >>        REQUIREMENT #4 SOLVED
> >>
> >> ———
> >>
> >> The TP4-VM (and, in turn, Gremlin) can naturally support:
> >>
> >>        1. Property graphs: as currently supported in TP3.
> >>        2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’)
> >> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns
> vertices
> >> whose id()s are xsd:string literals.
> >>        3. Hypergraphs: inV() can return more than one vertex.
> >>        4. Undirected graphs: in() and out() throw exceptions. Only
> both()
> >> works.
> >>        5. Meta-properties: value(‘name’) can return a TVertexProperty
> (a
> >> special complex object that is structure provider specific — and that is
> >> okay!).
> >>        6. Multi-properties: value(‘name’) can return a TPropertyArray of
> >> TVertexProperty objects.
> >>
> >> This means that the same instruction can behave differently for
> different
> >> structures. This is okay as there can be property graph, RDF,
> hypergraph,
> >> etc. test suites.
> >>
> >> Since complex objects don’t leave the TP4-VM barrier, providers can
> create
> >> any complex objects they want — they just have to have corresponding
> >> strategies to create provider-unique bytecode instructions (and thus,
> >> CFunctions) for those complex objects.
> >>
> >> Finally. there are a few of problems to work out:
> >>        - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]”
> >> representation. Is that bad? Perhaps not.
> >>        - What is the nature of a TPath? Its complex, but we want to
> >> return it.
> >>        - g.V().id() on an RDF graph can return a URI. Is a URI “simple”?
> >> No, the set of simple types should never grow!…. thus, URI => String. Is
> >> that wack?
> >>        - Do we add g.R() and g.D() to Gremlin to type-support TRow and
> >> TDocument objects. g.V() would be weird :( … Hmmmm?
> >>                - However, there are only so many data structures……. or
> >> are there? TMatrix, TXML, …. whoa.
> >>
> >> Thanks for reading,
> >> Marko.
> >>
> >> http://rredux.com <http://rredux.com/> <http://rredux.com/ <
> http://rredux.com/>>
>
>

Re: [DISCUSS] Primitive Types, Complex Types, and their Entailments in TP4

Reply via email to