Re: [DISCUSS] A Process Based Graph Reasoner

Marko Rodriguez Sat, 10 Oct 2015 10:09:04 -0700

Hello,

Your ideas on embedding schema information into the graph structure is the 
pattern that RDF uses where schema/data are all one data structure. Many years 
ago I had a client that was using TinkerGraph to hold their home-brewed schema 
("the partitioned graph") and were traversing it much like an RDF-reasoner to 
effect CRUD operations on the graph database. Thus, each read/write to the 
graph database was also various traversals against TinkerGraph. Since the 
schema didn't change much, it was just a GraphML file that each client loaded 
up when it connected to the graph database. It worked well and they liked it.


In my experience with RDF, 90% of the benefit comes from a very tiny relational 
algebra --- what AllegroGraph calls RDFS++ as OWL is, in most situations, an 
overdose. The question then is, what is the most efficient way to implement a 
reasoner. While it is sexy to store the schema in the graph structure itself, 
its not practical (in my opinion). What about storing it in a TinkerGraph 
parallel to the graph? Well, I would say, Gremlin is expensive relative to 
basic Set/List operations. The proposal in the JIRA is for a schema represented 
in in-memory Set/List data structures.

        g.V.out('ancestor').name

The ReasonerStrategy would do this:

if(vertexStep.getEdgeLabels().contains("ancestor")) {
  TraversalHelper.insertTraversalAfter(vertexStep, 
__.repeat(vertexStep.clone()).emit(),traversal);
  traversal.removeStep(vertexStep);
}

Rippin' fast. Can an "RDFS++"-reasoner be implemented with basic Set/List 
operations in Java? I bet so -- and thus, the ReasonerStrategy.build()…create() 
pattern articulated which:

        1. Doesn't in any way mutate graph data in the user's graph.
        2. Isn't made inconsistent if the user leverages the graph system 
provider's specific APIs/query language.
        3. Can be easily used or not used (its simply a Strategy) and thus, not 
global to all operations on the graph.
        4. Can (I believe) provide typical *useful* SemanticWeb community 
semantics to the PropertyGraph domain.

So, in conclusion. I concur, Schema stuff is great and I would like to see how 
your tp3-contrib/ repository shakes out. Please do share links when you have 
them. Does the ticket about a "process reasoner" require a Schema, no. The two 
concepts are related, but one is not foundational to the other. The process 
reasoner discussed only requires that property keys and edge labels exist and 
by the Graph interface as it stands, we are sure that providers support this.

Thank you for your thoughts,
Marko.

http://markorodriguez.com

On Oct 9, 2015, at 4:18 PM, pieter-gmail <[email protected]> wrote:

> Oy, so much to say,
> 
> Ontology is "study of the nature of being" (of the graph)
> 
> The traditional notion of schema is a subset of the rather infinite
> understanding (Ontology) and I'd say for many the starting point of any
> understanding.
> 
> I would surmise that a reasoning ontology would have to have some
> knowledge as to the nature(meta) of graph. This would include which
> labels is associated to which, the multiplicity, uniqueness, order,
> ownership, constraints... It might be easy as you say but it is
> ubiquitous and the structural foundation of any ontology.
> 
> The problems you mention regarding different providers is something that
> with time, success and confidence might become less of a issue.
> 
> I am of the opinion that much of the above mentioned ontological stuff
> is a mostly abstract concern for tp3. Just a interface specification.
> Specifying uniqueness or whatever is an ontological concern, indexes
> however is an implementation concern of the provider. BTW, the same goes
> for full text search. Lucene or whatever technology's
> features/limitations should not be the primary concern of tp3. Within
> reason of course. No point in specifying that which no one can implement.
> 
> In some ways tp3 (or me) is confused about tp3 being a implementation
> versus a specification. This concerns me a lot when I need to optimize
> tp3 steps. The more I optimize the less tp3 code execute. Don't get me
> wrong however, without the default implementation I would never even
> have started.
> 
> Another concern I have regarding all this is tp's agnosticism with
> respect to typing. An ontology should surely need to have some knowledge
> about the types it support and reasons over.
> 
> My own idea for implementing a schema model for tp3 is far more
> simplistic to start of with. I am toying with the idea of making it a
> sort of tp3-contrib lib. That way for any graph implementation an
> application higher up the stack will be able to access tp3 semantic
> schema information in a implementation agnostic manner.
> 
> The basic idea is to have a special partitioned graph with limited
> schema information. The default implementation stays with current tp3
> semantics except for capturing the java type of any property. Basically
> (not really thought about the details yet) a graph of
> vertexLabel->edgeLabel->vertexLabel with their respective properties and
> types.
> 
> Providers can then add custom feature like adding 'in', 'out' properties
> to add multiplicity, order, constraints, transitiveness and... The
> stricter tp3 becomes with specifying the ontological nature of graphs
> the richer the standard partitioned schema graph will become. However it
> will always remain lazy, schemaless, no need to specify anything upfront.
> 
> In your 'process reasoner' you explicitly specify the features of an
> edge, as far as I can see this is not different to what you would have
> to do with a 'structured reasoner'. In a default 'structured reasoner
> there is nothing to specify, unless you which to say that some label is
> 'transitive' or whatever. The time/space constraint to capture and for
> starting up an existing graphis in general minimal as the schema is so
> very small compared to the actual data. Somewhere in the ether I have
> heard that SAP has something like 50000 tables. A lot to understand but
> not much to load in space and time. The schema-partitaion should also be
> optional, probably even off by default.
> 
> To give you some indication of my own implementation issue with sqlg.
> Sqlg now supports java 8 java.time
> LocalDateTime/LocalDate/LocalTime/Duration/Period.
> Duration and Period and Integers are stored as integers in the rdbms,
> however their is no way without some schema information to know whether
> some integer field represents a Duration, Period or just a Integer.
> vertex.value("duration") should return a java.time.Duration but alas
> without additional schema support their is no way to know what the type
> of the field is.
> 
> If tp3 decides to have an opinion regarding typing I'd say java
> primitives, arrays of primitives and java.time.* should be standard
> without much discussion.
> 
> Thanks
> Pieter
> 
> 
> 
> On 09/10/2015 21:35, Marko Rodriguez wrote:
>> Hello,
>> 
>> So this ticket is more about a reasoning ontology than it is about a data 
>> validation/verification/constraint schema.
>> 
>> The former is "easy" to do as its a query time model. The latter is more 
>> difficult as we would have to expose some sort of Schema interface for graph 
>> system providers to expose schema constraints. Furthermore, each provider 
>> tends to do things differently (much like indices and thus TinkerPop is 
>> agnostic to the concept of index). For instance, Titan has a pretty rich 
>> schema model while Neo4j (I believe) only supports things like UNIQUE on a 
>> name (e.g.).
>> 
>> You could argue that a Schema system could be developed at the 
>> TraversalStrategy level, but then it starts to get hairy when people use 
>> "Blueprints" to write to the graph or the native interfaces of the 
>> underlying provider (e.g. using Cypher to write data). Now TinkerPop will 
>> think data is in one format, but its in another…. 
>> 
>> Can you say more as to how you see a validation/verification/constraint 
>> model being specified/implemented in a provided-agnostic way for TinkerPop3?
>> 
>> Thanks,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Oct 9, 2015, at 1:28 PM, pieter-gmail <[email protected]> wrote:
>> 
>>> Hi,
>>> 
>>> Perhaps I am missing exactly what you saying but it seems to me gremlin
>>> might become schema aware.
>>> 
>>> This is something I consider as crucial in understanding any data set.
>>> Perhaps its from my background but I generally fail to see how the
>>> NoSql/NoSchema/Document crowd understand their data by looking at rows
>>> or documents or vertices without a picture of the schema.
>>> 
>>> The schema may be lazily created but non the less all systems, I'd say,
>>> have a implicit schema which imho should be the starting point of any
>>> analysis.
>>> 
>>> This is true even if its some random key putted into a Redis instance.
>>> 
>>> While I am on the topic, even the tp3 modern graph, trivial as it may
>>> be, would be easier for me to 'get' if it was illustrated with a schema
>>> diagram before the graph itself was illustrated.
>>> 
>>> Cheers
>>> Pieter
>>> 
>>> On 09/10/2015 20:07, Marko Rodriguez wrote:
>>>> ardog4-fame on a blogpost discussing how Gremlin can traverser 
>>>> ontologically implied edges in the Stardo
>> 
>

Re: [DISCUSS] A Process Based Graph Reasoner

Reply via email to