Hi Pieter,

> I'll spend some time looking at sparql-gremlin and sql-gremlin to get a
> feel for it.

Cool. Yea, I'm interested in a RegEx to Gremlin compiler for the 
gremlin-examples/ package people are talking about.

        
[id:0][knows]*[label:person][age:gt(32)]([created]|[worksFor])+[knows]{5}

:) … Dunno, something crazy like that.


> Regarding implementing gremlin in C,C++, Go, Javascript I can not see
> how to do it without the Gremlin language specification itself becoming
> completely declarative. I imagine/hope it can still keep its fluent
> nature but that 'graphTraversal.out().repeat().this().that()' is no
> longer tied to a java Interface but only to a specification.
> Unfortunately I am out actually out of my depth here as I have never
> written AST parsers or worked anywhere near this field.

What do you mean declarative here? Gremlin as a language (which has both 
"declarative" and "imperative"-features) is simply a grouping of operations 
(called steps) into the following legal patterns:

f * g * h * k (linear motif)
f(g, h) * k   (nested motif)

That is Gremlin's syntax. Next, what are these "steps"? They are specified in 
the GraphTraversal API -- count(), sum(), group(), groupCount(), match(), 
union(), etc. All steps can be linear chained however, only some support 
nesting (e.g. union(out("knows"),in("knows"))). There are "step-modulators" 
that would need to be made more clear . However, they are simply syntactic 
sugar for nesting. For example, groupCount().by("name") is really just 
groupCount(values("name")).

> However I am imagining the Steps and GraphTraversal to be specified
> purely in natural language and that the TinkerPop's first job would
> perhaps be to write AST parsers for the current most popular languages.
> Once that is done the java world of TinkerGraph, Neo4j, Cassandra,
> Hadoop.... starts.

Why!? The beauty of Gremlin is that it only requires that the host language 
support function composition! And guess what, ever languages supports that. 
Thus, in C++, you would implement GraphTraversal as a C++ class much like 
Gremlin-Java. Let the language parser do the parsing for you! This is why 
Gremlin is the way it is, it doesn't take an expert in language parsing to 
create a Gremlin language variant. Look at Gremlin-Groovy, Gremlin-Scala, 
Gremlin-JavaScript, Gremlin-etc….. Gremlin is so dead simple from a syntactic 
point-of-view.

> Following on a previous thread on RDF would Gremlin need a proper
> specification of its meta model? As it stands there Gremlin implicit
> meta model does not fit exactly with that of RDF. Again I am somewhat
> out of my depth but still. Perhaps even before Gremlin as a query language 
> can be fully specified
> its underlying meta model needs to be fully accurately specified in
> natural language?

Yea, thats important. There are a few articles people have written about RDF 
<-> PropertyGraph mapping. The property graph data structure is a little more 
complex than RDF, but shouldn't be that hard to define in a specification.

> Would this approach open the door the Gremlin becoming a ANSI standard?

I think things like ANSI, WC3, etc. are old skool ways of doing such things. 
People use Hadoop. Where is the standard?! MapR just implements the HDFS API. 
DataStax just implements the HDFS API over Cassandra. No need to get all "a 
tribunal of thought leaders met in Zurich to procure standard version 
1.0.0.11.235.a."

I see it like this. Apache IS THE standards organization. You get enough people 
using this stuff, its easy to map to other languages/frameworks/etc., then the 
Apache representation of the language becomes the de facto representation. We 
have dev@, we have a VOTE model, we have PRs, we have a PMC. That is the modern 
day equivalent of "thought leaders meeting in Zurich." Its a collectively 
generated body of work that uses Apache processes to ensure quality. However, 
instead of going paper->code, we go code->paper. Thats the difference between 
Apache and ANSI (e.g.).

More thoughts?,
Marko.

http://markorodriguez.com


> On 13/01/2016 20:47, Marko Rodriguez wrote:
>> Hello Pieter,
>> 
>>> Have you started working on this?
>> Sorta. In my mind.
>> 
>>> I was wondering if this does not actually warrant a project of its own.
>>> Perhaps outside Apache as Apache (as far as I know) is not really in the
>>> business of publishing specs. If it has a space of its own somewhere it
>>> could be a more collaborative effort.
>>> … SNIP
>> Lately, in my daydreams, I've been leaning towards "a specification" 
>> inspired by the way Kuppitz built sparql-gremlin 
>> [https://github.com/dkuppitz/sparql-gremlin] and Wilmes built sql-gremlin 
>> [https://github.com/twilmes/sql-gremlin]. At first, I was all hell-bent on 
>> being all "virtual machine"-style with an instruction set (opcodes, 
>> parameters, and the like). However, after seeing how Kuppitz and Wilmes 
>> implemented their respective compilers, I thought it prudent to step out of 
>> the 80's and acknowledge that GraphTraversal is THE specification! 
>> 
>> What do I mean by this?
>> 
>> I don't think we should ever publish individual steps as being important. If 
>> we do, we will spend all our time trying to make the step library tiny 
>> ("only 15 instructions needed man!"). Also, we shouldn't do that cause it 
>> leads to optimization issues. For instance, you can implement GroupCountStep 
>> using GroupStep, but its not as fast. So, instead of focusing on individual 
>> steps, I think we should focus on GraphTraversal as the "machine" the 
>> generates your Gremlin traversals. Wilmes and Kuppitz never talk steps in 
>> their compilers, they simply create an empty GraphTraversal and then start 
>> appending steps (via GraphTraversal.method()) accordingly as defined by the 
>> respective high-level language parse. So much easier!
>> 
>> Similarly, this is how I believe DSLs should be created:
>> 
>> public class MyTraversal<S,E> implements Traversal.Admin<S,E> {
>>  GraphTraversal rawTraversal = new DefaultGraphTraversal();
>> 
>>  public MyTraversal people() {
>>    rawTraversal.hasLabel("person");
>>  }
>> 
>>  public MyTraversal knows(String personName) {
>>    rawTraversal.out("knows").has("name",personName); 
>>  }
>> 
>>  public E next() {
>>    return rawTraversal.next();
>>  }
>>  ...
>> }
>> 
>> If DSL designers are thinking of steps and their parameterization, it will 
>> really freeze our ability to optimize at the step level. Moreover, we don't 
>> want to encourage people to create their own steps as we want our 
>> TraversalStrategies to be generally useful regardless of the human-level 
>> language. Thus, the mantra should be "all traversals via GraphTraversal."
>> 
>> To your questions about a "language specification." If someone wants to 
>> create Gremlin in C++, well, I think they basically copy the GraphTraversal 
>> API and implement the respective functionality in various C++ step 
>> implementations. Thus, in a way, the GraphTraversal JavaDoc defines the 
>> language -- it is the specification (though, we need more semantics 
>> defined). I think this is the way to go instead of saying "here are all the 
>> instructions/steps in Gremlin with their respective parameterizations."
>> 
>> Thoughts?,
>> Marko.
>> 
>> http://markorodriguez.com
> 

Reply via email to