> > > I'd also wonder about how we treat subgraph() and tree()? could those be > a > > List<TPath> somehow?? > > Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you > don’t get back a graph, its stored in: > > g.withProcessor(TinkerGraphStructure.class, config1) > > That is, the subgraph is written to one of the registered structures. You > can then query it like any other registered structure. Remember, in TP4, we > will support an arbitrary number of structures associated with a Bytecode > source. >
I just thought of something interesting - if we can subgraph() into a TinkerGraph that way, then the opposite is true as well, right? like, you could pull a subgraph(), do some mutations to it locally, then later write some Gremlin to merge that subgraph back to its parent as a single transaction. i suppose the nature of a "single transaction" would be specific to each graph provider, but still neat to think about. On Mon, Apr 15, 2019 at 2:19 PM Marko Rodriguez <[email protected]> wrote: > Hello Stephen, > > > I'd also wonder about how we treat subgraph() and tree()? could those be > a > > List<TPath> somehow?? > > Yes, Tree is List<TPath>. Subgraph….hmmmm….shooting from the hip: you > don’t get back a graph, its stored in: > > g.withProcessor(TinkerGraphStructure.class, config1) > > That is, the subgraph is written to one of the registered structures. You > can then query it like any other registered structure. Remember, in TP4, we > will support an arbitrary number of structures associated with a Bytecode > source. > > > isn't a URI a complex type? that list is expected to grow? maybe all > > complex types have simple type representations? > > The problem with every complex type having a simple type representation is > that the serializer will have to know about complex types (as objects). > This is just more code for Python, JavaScript, Java, etc. to maintain. If > the serialization format is ONLY primitives, and primitives come from a > static set of ~10 types, then writing, testing, and maintaining serializers > in other languages will be trivial. > > Bytecode in [a nested list of primitives] > Traversers out [a collection of coefficient wrapped primitives] > > Everything communicated over the wire is primitive! Basic. (TTraverser > will have to be primitive, where get() returns a coefficient [bulk] and > primitive [object] pair). > > > sorry, if some of these questions/ideas are a bit half-cocked, but i read > > this really fast and won't be at my laptop for the rest of the day and > > wanted to get some thoughts out. i'm really really interested in seeing > > this aspect of TP done "right"…. > > No worries. Thanks for replying. > > Some random ideas I was having. > > - TXML: Assume an XML database. out() would be the children tags. > value() would be the tag attribute value. label() would be the tag type. In > other words, there is a clean mapping from the instructions to XML. > - TMatrix: Assume a database of nxm matricies. math() instruction > will be augmented to support matrix multiplication. A matrix is a table > with rows and columns. We would need some nice instructions for that. > - TJPEG: Assume a database of graphics. Does our instruction set > have instructions that are useful for manipulating images? Probably need > row/column type instructions like TMatrix. > - TObject: Assume an object database. value() are primitive > fields. out() is object fields. id() is unique object identifier. label() > is object class. has() is a primitive field filter. > - TTimeSeries: ? I don’t know anything about time series > databases, but the question remains…do our instructions make sense for this > data structure? > - https://en.wikipedia.org/wiki/List_of_data_structures < > https://en.wikipedia.org/wiki/List_of_data_structures> > > The point being. I’m trying to think of odd ball data structures and then > trying to see if the TP4 instruction set is sufficiently general to > encompass operations used by those structures. > > The beautiful thing is that providers can create as many complex types as > they want. These types are always contained with the TP4-VM and thus > require no changes to the serialization format and respective objects in > the deserializing language. Imagine, some XML database out there is using > the TP4-VM, with the XPath language compiling to TP4 bytecode, and is > processing their XML documents in real-time (Pipes/Rx), near-time > (Flink/Akka), or batch-time (Spark/Hadoop). The TP4-VM has a life beyond > graph! What a wonderful asset to the entire space of data processing! > > …now think of the RDF community using the TP4-VM. SPARQL will be > W3C-compilant and can execute in real-time, near-time, batch-time, etc. > What a useful technology to adopt for your RDF triple-store. I could see > Stardog using TP4 for their batch processing. I could see Jena or OpenRDF > importing TP4 to provide different SPARQL execution engines to their > triple-store providers. > > The TP4 virtual machine may just turn out to be a technological > masterpiece. > > Marko. > > http://rredux.com > > > > > > > > > > > On Mon, Apr 15, 2019 at 8:06 AM Marko Rodriguez <[email protected] > <mailto:[email protected]>> > > wrote: > > > >> Hello, > >> > >> I have a consolidated approach to handling data structures in TP4. I > would > >> appreciate any feedback you many have. > >> > >> 1. Every object processed by TinkerPop has a TinkerPop-specific > >> type. > >> - TLong, TInteger, TString, TMap, TVertex, TEdge, TPath, > >> TList, … > >> - BENEFIT #1: A universal type system will protect us > from > >> language platform peculiarities (e.g. Python long vs Java long). > >> - BENEFIT #2: The serialization format is constrained and > >> consistent across all languages platforms. (no more coming across a > >> MySpecialClass). > >> 2. All primitive T-type data can be directly access via get(). > >> - TBoolean.get() -> java.lang.Boolean | System.Boolean | > >> ... > >> - TLong.get() -> java.lang.Long | System.Int64 | ... > >> - TString.get() -> java.lang.String | System.String | … > >> - TList.get() -> java.lang.ArrayList | .. // can only > >> contain primitives > >> - TMap.get() -> java.lang.LinkedHashMap | .. // can only > >> contain primitives > >> - ... > >> 3. All complex T-types have no methods! (except those afforded by > >> Object) > >> - TVertex: no accessible methods. > >> - TEdge: no accessible methods. > >> - TRow: no accessible methods. > >> - TDocument: no accessible methods. > >> - TDocumentArray: no accessible methods. // a document > >> list field that can contain complex objects > >> - ... > >> > >> REQUIREMENT #1: We need to be able to support multiple graphdbs in the > >> same query. > >> - e.g., read from JanusGraph and write to Neo4j. > >> REQUIREMENT #2: We need to make sure complex objects can not be queried > >> client-side for properties/edges/etc. data. > >> - e.g., vertices are universally assumed to be > “detached." > >> REQUIREMENT #3: We no longer want to maintain a structure test suite. > >> Operational semantics should be verified via Bytecode -> > >> Processor/Structure. > >> - i.e., the only way to read/write vertices is via > >> Bytecode as complex T-types don’t have APIs. > >> REQUIREMENT #4: We should support other database data structures besides > >> graph. > >> - e.g., reading from MySQL and writing to JanusGraph. > >> > >> ——— > >> > >> Assume the following TraversalSource: > >> > >> g.withStructure(JanusGraphStructure.class, config1). > >> withStructure(Neo4jStructure.class, conflg2) > >> > >> Now, assume the following traversal fragment: > >> > >> outE(’knows’).has(’stars’,5).inV() > >> > >> This would initially be written to Bytecode as: > >> > >> [[outE,knows],[has,stars,5],[inV]] > >> > >> A decoration strategy realizes that there are two structures registered > in > >> the Bytecode source instructions and would rewrite the above as: > >> > >> [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]]] > >> > >> A JanusGraph strategy would rewrite this as: > >> > >> > >> > [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]]] > >> > >> A Neo4j strategy would rewrite this as: > >> > >> > >> > [choose,[[type,TVertex]],[[outE,knows],[has,stars,5],[inV]],[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]] > >> > >> A finalization strategy would rewrite this as: > >> > >> > >> > [choose,[[type,JanusVertex]],[[jg:vertexCentric,out,knows,stars,5]],[[type,Neo4jVertex]],[[neo:outE,knows],[neo:has,stars,5],[neo:inV]]] > >> > >> Now, when a TVertex gets to this CFunction, it will check its type, if > its > >> a JanusVertex, it goes down the JanusGraph-specific instruction branch. > If > >> the type is Neo4jVertex, it goes down the Neo4j-specific instruction > branch. > >> > >> REQUIREMENT #1 SOLVED > >> > >> The last instruction of the root bytecode can not return a complex > object. > >> If so, an exception is thrown. g.V() is illegal. g.V().id() is legal. > >> Complex objects do not exist outside the TP4-VM. Only primitives can > leave > >> the VM-client barrier. If you want vertex property data (e.g.), you > have to > >> access it and return it within the traversal — e.g., g.V().valueMap(). > >> BENEFIT #1: Language variant implementations are simple. Just > >> primitives. > >> BENEFIT #2: The serialization specification is simple. Just > >> primitives. (also, note that Bytecode is just a TList of primitives! — > >> though TBytecode will exist.) > >> BENEFIT #3: The concept of a “DetachedVertex” is universally > >> assumed. > >> > >> REQUIREMENT #2 SOLVED > >> > >> It is completely up to the structure provider to use structure-specific > >> instructions for dealing with their particular TVertex. They will have > to > >> provide CFunction implementations for out, in, both, has, outE, inE, > bothE, > >> drop, property, value, id, label … (seems like a lot, but out/in/both > could > >> be one parameterized CFunction). > >> BENEFIT #1: No more structure/ API and structure/ test suite. > >> BENEFIT #2: The structure provider has full control of where the > >> vertex data is stored (cached in memory or fetch from the db or a cut > >> vertex or …). No assumptions are made by the TP4-VM. > >> BENEFIT #3: The structure provider can safely assume their > >> vertices will not be accessed outside the TP4-VM (outside the > processor). > >> > >> REQUIREMENT #3 SOLVED > >> > >> We can support TRow for relational databases. A TRow’s data is > accessible > >> via the instructions has, hasKey, value, property, id, ... The location > of > >> the data in TRow is completely up to the structure provider and its > >> strategy analysis (if only ’name’ is accessed, then SELECT ’name’ > FROM...). > >> We can easily support TDocument for document databases. A TDocument’s > data > >> is accessible via the instructions has, hasKey, value, property, id, … A > >> value() could return yet another TDocument (or a TDocumentArray > containing > >> TDocuments). > >> > >> Supporting a new complex type is simply a function of asking: > >> > >> “Does the TP4 VM instruction set have the requisite > >> instruction-types (semantically) to manipulate this structure?" > >> > >> We are no longer playing the language-specific object API game. We are > >> playing the language-agnostic VM instruction game. The TP4-VM > instruction > >> set is the sole determiner of what complex objects can be processed. > (i.e. > >> what data structures can be processed without impedance mismatch). > >> > >> REQUIREMENT #4 SOLVED > >> > >> ——— > >> > >> The TP4-VM (and, in turn, Gremlin) can naturally support: > >> > >> 1. Property graphs: as currently supported in TP3. > >> 2. RDF graphs: id() is a URI | Literal. g.V(1).value(‘foaf:name’) > >> returns multi/meta-properties *or* g.V(1).out(‘foaf:name’) returns > vertices > >> whose id()s are xsd:string literals. > >> 3. Hypergraphs: inV() can return more than one vertex. > >> 4. Undirected graphs: in() and out() throw exceptions. Only > both() > >> works. > >> 5. Meta-properties: value(‘name’) can return a TVertexProperty > (a > >> special complex object that is structure provider specific — and that is > >> okay!). > >> 6. Multi-properties: value(‘name’) can return a TPropertyArray of > >> TVertexProperty objects. > >> > >> This means that the same instruction can behave differently for > different > >> structures. This is okay as there can be property graph, RDF, > hypergraph, > >> etc. test suites. > >> > >> Since complex objects don’t leave the TP4-VM barrier, providers can > create > >> any complex objects they want — they just have to have corresponding > >> strategies to create provider-unique bytecode instructions (and thus, > >> CFunctions) for those complex objects. > >> > >> Finally. there are a few of problems to work out: > >> - There is no way to yield a “v[1]” or “e[3][v[1]-knows->v[2]]” > >> representation. Is that bad? Perhaps not. > >> - What is the nature of a TPath? Its complex, but we want to > >> return it. > >> - g.V().id() on an RDF graph can return a URI. Is a URI “simple”? > >> No, the set of simple types should never grow!…. thus, URI => String. Is > >> that wack? > >> - Do we add g.R() and g.D() to Gremlin to type-support TRow and > >> TDocument objects. g.V() would be weird :( … Hmmmm? > >> - However, there are only so many data structures……. or > >> are there? TMatrix, TXML, …. whoa. > >> > >> Thanks for reading, > >> Marko. > >> > >> http://rredux.com <http://rredux.com/> <http://rredux.com/ < > http://rredux.com/>> > >
