Oy, so much to say,

Ontology is "study of the nature of being" (of the graph)

The traditional notion of schema is a subset of the rather infinite
understanding (Ontology) and I'd say for many the starting point of any
understanding.

I would surmise that a reasoning ontology would have to have some
knowledge as to the nature(meta) of graph. This would include which
labels is associated to which, the multiplicity, uniqueness, order,
ownership, constraints... It might be easy as you say but it is
ubiquitous and the structural foundation of any ontology.

The problems you mention regarding different providers is something that
with time, success and confidence might become less of a issue.

I am of the opinion that much of the above mentioned ontological stuff
is a mostly abstract concern for tp3. Just a interface specification.
Specifying uniqueness or whatever is an ontological concern, indexes
however is an implementation concern of the provider. BTW, the same goes
for full text search. Lucene or whatever technology's
features/limitations should not be the primary concern of tp3. Within
reason of course. No point in specifying that which no one can implement.

In some ways tp3 (or me) is confused about tp3 being a implementation
versus a specification. This concerns me a lot when I need to optimize
tp3 steps. The more I optimize the less tp3 code execute. Don't get me
wrong however, without the default implementation I would never even
have started.

Another concern I have regarding all this is tp's agnosticism with
respect to typing. An ontology should surely need to have some knowledge
about the types it support and reasons over.

My own idea for implementing a schema model for tp3 is far more
simplistic to start of with. I am toying with the idea of making it a
sort of tp3-contrib lib. That way for any graph implementation an
application higher up the stack will be able to access tp3 semantic
schema information in a implementation agnostic manner.

The basic idea is to have a special partitioned graph with limited
schema information. The default implementation stays with current tp3
semantics except for capturing the java type of any property. Basically
(not really thought about the details yet) a graph of
vertexLabel->edgeLabel->vertexLabel with their respective properties and
types.

Providers can then add custom feature like adding 'in', 'out' properties
to add multiplicity, order, constraints, transitiveness and... The
stricter tp3 becomes with specifying the ontological nature of graphs
the richer the standard partitioned schema graph will become. However it
will always remain lazy, schemaless, no need to specify anything upfront.

In your 'process reasoner' you explicitly specify the features of an
edge, as far as I can see this is not different to what you would have
to do with a 'structured reasoner'. In a default 'structured reasoner
there is nothing to specify, unless you which to say that some label is
'transitive' or whatever. The time/space constraint to capture and for
starting up an existing graphis in general minimal as the schema is so
very small compared to the actual data. Somewhere in the ether I have
heard that SAP has something like 50000 tables. A lot to understand but
not much to load in space and time. The schema-partitaion should also be
optional, probably even off by default.

To give you some indication of my own implementation issue with sqlg.
Sqlg now supports java 8 java.time
LocalDateTime/LocalDate/LocalTime/Duration/Period.
Duration and Period and Integers are stored as integers in the rdbms,
however their is no way without some schema information to know whether
some integer field represents a Duration, Period or just a Integer.
vertex.value("duration") should return a java.time.Duration but alas
without additional schema support their is no way to know what the type
of the field is.

If tp3 decides to have an opinion regarding typing I'd say java
primitives, arrays of primitives and java.time.* should be standard
without much discussion.

Thanks
Pieter



On 09/10/2015 21:35, Marko Rodriguez wrote:
> Hello,
>
> So this ticket is more about a reasoning ontology than it is about a data 
> validation/verification/constraint schema.
>
> The former is "easy" to do as its a query time model. The latter is more 
> difficult as we would have to expose some sort of Schema interface for graph 
> system providers to expose schema constraints. Furthermore, each provider 
> tends to do things differently (much like indices and thus TinkerPop is 
> agnostic to the concept of index). For instance, Titan has a pretty rich 
> schema model while Neo4j (I believe) only supports things like UNIQUE on a 
> name (e.g.).
>
> You could argue that a Schema system could be developed at the 
> TraversalStrategy level, but then it starts to get hairy when people use 
> "Blueprints" to write to the graph or the native interfaces of the underlying 
> provider (e.g. using Cypher to write data). Now TinkerPop will think data is 
> in one format, but its in another…. 
>
> Can you say more as to how you see a validation/verification/constraint model 
> being specified/implemented in a provided-agnostic way for TinkerPop3?
>
> Thanks,
> Marko.
>
> http://markorodriguez.com
>
> On Oct 9, 2015, at 1:28 PM, pieter-gmail <[email protected]> wrote:
>
>> Hi,
>>
>> Perhaps I am missing exactly what you saying but it seems to me gremlin
>> might become schema aware.
>>
>> This is something I consider as crucial in understanding any data set.
>> Perhaps its from my background but I generally fail to see how the
>> NoSql/NoSchema/Document crowd understand their data by looking at rows
>> or documents or vertices without a picture of the schema.
>>
>> The schema may be lazily created but non the less all systems, I'd say,
>> have a implicit schema which imho should be the starting point of any
>> analysis.
>>
>> This is true even if its some random key putted into a Redis instance.
>>
>> While I am on the topic, even the tp3 modern graph, trivial as it may
>> be, would be easier for me to 'get' if it was illustrated with a schema
>> diagram before the graph itself was illustrated.
>>
>> Cheers
>> Pieter
>>
>> On 09/10/2015 20:07, Marko Rodriguez wrote:
>>> ardog4-fame on a blogpost discussing how Gremlin can traverser 
>>> ontologically implied edges in the Stardo
>

Reply via email to