Re: [DISCUSS] A Process Based Graph Reasoner

Marko Rodriguez Sun, 11 Oct 2015 08:09:46 -0700

This is pretty cool. Stardog rule syntax.

        http://docs.stardog.com/#_stardog_rules_syntax


Sort of like user-defined steps in Gremlin2.

Marko.

http://markorodriguez.com

On Oct 10, 2015, at 11:08 AM, Marko Rodriguez <[email protected]> wrote:

> Hello,
> 
> Your ideas on embedding schema information into the graph structure is the 
> pattern that RDF uses where schema/data are all one data structure. Many 
> years ago I had a client that was using TinkerGraph to hold their home-brewed 
> schema ("the partitioned graph") and were traversing it much like an 
> RDF-reasoner to effect CRUD operations on the graph database. Thus, each 
> read/write to the graph database was also various traversals against 
> TinkerGraph. Since the schema didn't change much, it was just a GraphML file 
> that each client loaded up when it connected to the graph database. It worked 
> well and they liked it.
> 
> In my experience with RDF, 90% of the benefit comes from a very tiny 
> relational algebra --- what AllegroGraph calls RDFS++ as OWL is, in most 
> situations, an overdose. The question then is, what is the most efficient way 
> to implement a reasoner. While it is sexy to store the schema in the graph 
> structure itself, its not practical (in my opinion). What about storing it in 
> a TinkerGraph parallel to the graph? Well, I would say, Gremlin is expensive 
> relative to basic Set/List operations. The proposal in the JIRA is for a 
> schema represented in in-memory Set/List data structures.
> 
>       g.V.out('ancestor').name
> 
> The ReasonerStrategy would do this:
> 
> if(vertexStep.getEdgeLabels().contains("ancestor")) {
>   TraversalHelper.insertTraversalAfter(vertexStep, 
> __.repeat(vertexStep.clone()).emit(),traversal);
>   traversal.removeStep(vertexStep);
> }
> 
> Rippin' fast. Can an "RDFS++"-reasoner be implemented with basic Set/List 
> operations in Java? I bet so -- and thus, the 
> ReasonerStrategy.build()…create() pattern articulated which:
> 
>       1. Doesn't in any way mutate graph data in the user's graph.
>       2. Isn't made inconsistent if the user leverages the graph system 
> provider's specific APIs/query language.
>       3. Can be easily used or not used (its simply a Strategy) and thus, not 
> global to all operations on the graph.
>       4. Can (I believe) provide typical *useful* SemanticWeb community 
> semantics to the PropertyGraph domain.
> 
> So, in conclusion. I concur, Schema stuff is great and I would like to see 
> how your tp3-contrib/ repository shakes out. Please do share links when you 
> have them. Does the ticket about a "process reasoner" require a Schema, no. 
> The two concepts are related, but one is not foundational to the other. The 
> process reasoner discussed only requires that property keys and edge labels 
> exist and by the Graph interface as it stands, we are sure that providers 
> support this.
> 
> Thank you for your thoughts,
> Marko.
> 
> http://markorodriguez.com
> 
> On Oct 9, 2015, at 4:18 PM, pieter-gmail <[email protected]> wrote:
> 
>> Oy, so much to say,
>> 
>> Ontology is "study of the nature of being" (of the graph)
>> 
>> The traditional notion of schema is a subset of the rather infinite
>> understanding (Ontology) and I'd say for many the starting point of any
>> understanding.
>> 
>> I would surmise that a reasoning ontology would have to have some
>> knowledge as to the nature(meta) of graph. This would include which
>> labels is associated to which, the multiplicity, uniqueness, order,
>> ownership, constraints... It might be easy as you say but it is
>> ubiquitous and the structural foundation of any ontology.
>> 
>> The problems you mention regarding different providers is something that
>> with time, success and confidence might become less of a issue.
>> 
>> I am of the opinion that much of the above mentioned ontological stuff
>> is a mostly abstract concern for tp3. Just a interface specification.
>> Specifying uniqueness or whatever is an ontological concern, indexes
>> however is an implementation concern of the provider. BTW, the same goes
>> for full text search. Lucene or whatever technology's
>> features/limitations should not be the primary concern of tp3. Within
>> reason of course. No point in specifying that which no one can implement.
>> 
>> In some ways tp3 (or me) is confused about tp3 being a implementation
>> versus a specification. This concerns me a lot when I need to optimize
>> tp3 steps. The more I optimize the less tp3 code execute. Don't get me
>> wrong however, without the default implementation I would never even
>> have started.
>> 
>> Another concern I have regarding all this is tp's agnosticism with
>> respect to typing. An ontology should surely need to have some knowledge
>> about the types it support and reasons over.
>> 
>> My own idea for implementing a schema model for tp3 is far more
>> simplistic to start of with. I am toying with the idea of making it a
>> sort of tp3-contrib lib. That way for any graph implementation an
>> application higher up the stack will be able to access tp3 semantic
>> schema information in a implementation agnostic manner.
>> 
>> The basic idea is to have a special partitioned graph with limited
>> schema information. The default implementation stays with current tp3
>> semantics except for capturing the java type of any property. Basically
>> (not really thought about the details yet) a graph of
>> vertexLabel->edgeLabel->vertexLabel with their respective properties and
>> types.
>> 
>> Providers can then add custom feature like adding 'in', 'out' properties
>> to add multiplicity, order, constraints, transitiveness and... The
>> stricter tp3 becomes with specifying the ontological nature of graphs
>> the richer the standard partitioned schema graph will become. However it
>> will always remain lazy, schemaless, no need to specify anything upfront.
>> 
>> In your 'process reasoner' you explicitly specify the features of an
>> edge, as far as I can see this is not different to what you would have
>> to do with a 'structured reasoner'. In a default 'structured reasoner
>> there is nothing to specify, unless you which to say that some label is
>> 'transitive' or whatever. The time/space constraint to capture and for
>> starting up an existing graphis in general minimal as the schema is so
>> very small compared to the actual data. Somewhere in the ether I have
>> heard that SAP has something like 50000 tables. A lot to understand but
>> not much to load in space and time. The schema-partitaion should also be
>> optional, probably even off by default.
>> 
>> To give you some indication of my own implementation issue with sqlg.
>> Sqlg now supports java 8 java.time
>> LocalDateTime/LocalDate/LocalTime/Duration/Period.
>> Duration and Period and Integers are stored as integers in the rdbms,
>> however their is no way without some schema information to know whether
>> some integer field represents a Duration, Period or just a Integer.
>> vertex.value("duration") should return a java.time.Duration but alas
>> without additional schema support their is no way to know what the type
>> of the field is.
>> 
>> If tp3 decides to have an opinion regarding typing I'd say java
>> primitives, arrays of primitives and java.time.* should be standard
>> without much discussion.
>> 
>> Thanks
>> Pieter
>> 
>> 
>> 
>> On 09/10/2015 21:35, Marko Rodriguez wrote:
>>> Hello,
>>> 
>>> So this ticket is more about a reasoning ontology than it is about a data 
>>> validation/verification/constraint schema.
>>> 
>>> The former is "easy" to do as its a query time model. The latter is more 
>>> difficult as we would have to expose some sort of Schema interface for 
>>> graph system providers to expose schema constraints. Furthermore, each 
>>> provider tends to do things differently (much like indices and thus 
>>> TinkerPop is agnostic to the concept of index). For instance, Titan has a 
>>> pretty rich schema model while Neo4j (I believe) only supports things like 
>>> UNIQUE on a name (e.g.).
>>> 
>>> You could argue that a Schema system could be developed at the 
>>> TraversalStrategy level, but then it starts to get hairy when people use 
>>> "Blueprints" to write to the graph or the native interfaces of the 
>>> underlying provider (e.g. using Cypher to write data). Now TinkerPop will 
>>> think data is in one format, but its in another…. 
>>> 
>>> Can you say more as to how you see a validation/verification/constraint 
>>> model being specified/implemented in a provided-agnostic way for TinkerPop3?
>>> 
>>> Thanks,
>>> Marko.
>>> 
>>> http://markorodriguez.com
>>> 
>>> On Oct 9, 2015, at 1:28 PM, pieter-gmail <[email protected]> wrote:
>>> 
>>>> Hi,
>>>> 
>>>> Perhaps I am missing exactly what you saying but it seems to me gremlin
>>>> might become schema aware.
>>>> 
>>>> This is something I consider as crucial in understanding any data set.
>>>> Perhaps its from my background but I generally fail to see how the
>>>> NoSql/NoSchema/Document crowd understand their data by looking at rows
>>>> or documents or vertices without a picture of the schema.
>>>> 
>>>> The schema may be lazily created but non the less all systems, I'd say,
>>>> have a implicit schema which imho should be the starting point of any
>>>> analysis.
>>>> 
>>>> This is true even if its some random key putted into a Redis instance.
>>>> 
>>>> While I am on the topic, even the tp3 modern graph, trivial as it may
>>>> be, would be easier for me to 'get' if it was illustrated with a schema
>>>> diagram before the graph itself was illustrated.
>>>> 
>>>> Cheers
>>>> Pieter
>>>> 
>>>> On 09/10/2015 20:07, Marko Rodriguez wrote:
>>>>> ardog4-fame on a blogpost discussing how Gremlin can traverser 
>>>>> ontologically implied edges in the Stardo
>>> 
>> 
>

Re: [DISCUSS] A Process Based Graph Reasoner

Reply via email to