Hey,

This might be a bit long but should explain a few of the pitfalls of making
a gremlin language variant outside of the JVM.

The biggest challenges fall around the following categories. I'll elaborate
on these further down :
  1. Method overloading is not always native or it is implemented
differently. (some languages have very limited typing as well)
  2. Even if the overloading is handled correctly conflicts can arise from
lack of typing in some languages, using bindings and/or server side
variables.
  3. For a functional library the APIs for Graph, traversal,
Elements(node/vertex), and native Java API need to be made available.

There are also further issues related to gremlin versions, performance, and
functionality that I'll skim at the end of this post.

*1. Method overloading :*

abstract class Query {
   public function has(PropertyKey $key); //1
   public function has(PropertyKey $key, Object $value); //2
   public function has(Label $label, String $value); //3
   public function has(VertexId $id, Long $value); //4
   public function has(VertexId $id, Int $value); //5
   public function has(VertexId $id, Predicate $p); //6
}

The above is illegal in languages like PHP (or javascript?). Instead we're
stuck with :

abstract class Query {
   public function has(Array $args);
}

We're then left to figure out what is what in the array and sort out how we
need to stringify the output.

If the user does $g->V()->has("label", "user") do we add quotes to the
first argument or is it a label/id? What about the second argument, is it a
predicate? etc.  This gets complexe very quickly.
And what if I had $g->V()->has("id", 36) . PHP only supports Int so one of
the two signatures (4 or 5) needs to give as we have a major conflict. This
example is fictional for has() but I've run into this on a couple of other
methods, just can't remember which.

Another example would be  g.V().has(id, neq(m)) . We could imagine the
following PHP equivalent $g->V()->has(new Id(), Predicate::neq("m")) where
Id() is a class that helps us recognize this type, and neq() a static
method of Predicate. However "m" has to be passed as string and we have no
clue what m is... is this a string or a binding or a server side variable?
More on this in point *2.*

To close things off here there's also the case of signatures like out(String...
edgeLabels) that need their own logic.

*Conclusion*: There's a lot of manual work that needs to go into separating
the logic between signatures and handling special cases. Part of this can
be automated if your language supports magic getters and setters by parsing
the javadocs for example. But not only is that an if, the rest will still
be manual. This step is maintenance heavy.

*2. Conflicts*

Because we're manipulating strings it's really hard to tell a few items
appart (binding vs server variable vs string; Theres a reason why I
separate binding and variable).

For instance in the example above of *gremlin :* g.V().has(id, neq(m)) vs
*PHP:* $g->V()->has(new Id(), Predicate::neq("m")) we don't know what to
make of m. Is this a binding or a string or even a variable that was
previously set in the session? There is no clean way of working around this.

Firstly because bindings tend to be handled on a different layer than the
query builder.
Secondly because methods that will help in avoiding the conflicts will also
lose typing data.
For example : $g->V()->has(new Id(),
Predicate::neq(Query::variable("m"))) could
generate the proper query by outputting m without quotes but we don't know
what type m is so in some cases it might be tricky to select the proper
signature.

*Conclusion*: there are a number of ways around this point. We use prefixes
B_m or V_m and a hack to ignore signatures altogether when in this
scenario. It's not that these aren't solve-able they just aren't trivial.

*3. API*

Why we would need traversal, graph, vertex and edge APIs are quite self
explanatory for everyday work with Gremlin. I'm just going to expose why we
would also require some Java classes as well.

Because JSON is lossy by nature we often have to cast variables to certain
types. For example by submitting these kind of scripts :
g.V(1).property("date",
new Date(B_m)); with B_m = timestamp. This is just another case that is
difficult to cover.

This adds onto the other points in making a gremlin language variant
non-trivial.

All of the above can be worked around by using an injection method that
just appends a string to the query : $g->customStep("V().has(id, neq(m))") but
that's besides the point.

*Final Conclusion:* It's not a trivial task. Of course the examples above
are very verbose and achieving something closer to gremlin in style is
possible but there are always going to be "gotchas" users will need to keep
in mind.  A while back in TP2 I released a php library for this (the one we
currently use in our projects). I decided to remove it as it was too much
maintenance to get it to work across user causes so I decided to
concentrate on our own one (some choices made in *2.* wouldn't have worked
for other cases)
I'm convinced there's got to be a way of reconciling everything and getting
this to work flawlessly but it's going to require a lot of thought/work


PS: I mentioned some other points like managing multiple versions of
gremlin (for two lines of releases) which is a real headache.
For performance it may be good to allow the builder to handle multiple
lines, which comes with it's load of complications as well.
And then there's the ability to "block" queries and either inject them into
each other or merge them together which simplifies unit testing and extends
functionality :

$query = $g->V()->out("likes")->flag("flagname")->has("age", 20);
// Some logic here accesses new information and realizes the query needs
altering
$query->getFlag("flagname")->out("hates", true) // true for merge
$query->toString(); // g.V().out('likes', hates').has('age', 20)

But this point alone could warrant it's own email as it is relatively
complex. Though TP3 has simplified some cases thanks to union() and some
other steps.

Our builder supports all of the above so if you have any questions feel
free to ask me.

Phew that was long. I'll add this to the ticket in a bit.

On Tue, Apr 12, 2016 at 4:37 PM, Marko Rodriguez <okramma...@gmail.com>
wrote:

> Hello everyone,
>
> Please see the section entitled "Host Language Embedding" here:
> http://www.planettinkerpop.org/#gremlin (3 sections down)
>
> When I was writing up this section, I noticed that most of the language
> drivers that are advertised on our homepage (
> http://tinkerpop.incubator.apache.org/#graph-libraries) know how to talk
> to Gremlin Server via web sockets, REST, etc., but rely on the user to
> create a String of their graph traversal and submit it. For instance, here
> is a snippet from the Gremlin-PHP documentation:
>
> $db = new Connection([
>     'host' => 'localhost',
>     'graph' => 'graph',
>     'username' => 'pomme',
>     'password' => 'hardToCrack'
> ]);
> //you can set $db->timeout = 0.5; if you wish
> $db->open();
> $db->send('g.V(2)');
> //do something with result
> $db->close();
>
>
> $db->send(String) is great, but it would be better if the user didn't have
> to leave PHP.
>
> Please see this ticket:
> https://issues.apache.org/jira/browse/TINKERPOP-1232
>
> I think for non-JVM languages, it would be nice if these drivers (PHP,
> JavaScript, Python, etc.) didn't require the user to explicitly create
> Gremlin-XXX Strings, but instead either used JINI or model-3 in the ticket
> above. Lets look at model-3 as I think its the easiest and more general.
>
> For instance, they would have a class in their native language that would
> mirror the GraphTraversal API. *** I don't know any other languages well
> enough, so I'm just going to do this in Groovy :), hopefully you get the
> generalized point. ***
>
> public class Test {
>
>   String s;
>
>   public Test(final String source) {
>     s = source;
>   }
>
>   public Test() {
>     s = "";
>   }
>
>   public Test V() {
>     s = s + ".V()";
>     return this;
>   }
>
>   public Test outE(final String label) {
>     s = s + ".outE(\"${label}\")";
>     return this;
>   }
>
>   public Test repeat(final Test test) {
>     s = s + ".repeat(${test.toString()})";
>     return this;
>   }
>
>   public String toString() {
>     return s;
>   }
> }
>
>
> Then, via fluency (function composition) and nesting, you could generate a
> Gremlin-Groovy (or which ever ScriptEngine language) traversal String in
> the backend.
>
> gremlin> g = new Test("g");
> ==>g
> gremlin> g.V().outE("knows")
> ==>g.V().outE("knows")
> gremlin>
> gremlin> g = new Test("g");
> ==>g
> gremlin> g.V().repeat(new Test().outE("knows"))
> ==>g.V().repeat(.outE("knows"))
> gremlin>
>
>
> From there, that String is then submitted as you normally do with your
> driver. For instance, with Gremlin-PHP, via $db->send(String).
>
> Of course, if your driver is already on a JVM language, there is no reason
> to do this (e.g. Gremlin-Scala), but if you are not on the JVM, this gives
> the user host language embedding and a more natural "look and feel."
> Moreover, if your language doesn't use "dot notation," you would use the
> natural idioms of your language.
>
> $g->V->outE("knows")
>
>
> If anyone is interested in updating their non-JVM language driver to use
> this model, I would like to write a blog post about it. Or perhaps, a
> tutorial for for language designers.
>
> Thoughts?,
> Marko.
>
> http://markorodriguez.com
>
> --
> You received this message because you are subscribed to the Google Groups
> "Gremlin-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gremlin-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/gremlin-users/47A92EFF-CB36-41EA-B252-6823A42F4D7B%40gmail.com
> <https://groups.google.com/d/msgid/gremlin-users/47A92EFF-CB36-41EA-B252-6823A42F4D7B%40gmail.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

Reply via email to