[DISCUSS] Graph.io() deprecation

Stephen Mallette Fri, 29 Jun 2018 11:13:57 -0700

So, we made the change on:

https://issues.apache.org/jira/browse/TINKERPOP-1985


to deprecate BLVP because we're going to delegate bulk loading to graph
providers. The issue I now see is that Graph.io() is also a form of bulk
loading which means it should probably go the route of deprecation if we're
following our thinking to the letter. But, I don't think that "bulk
loading" is the only reason to move on from Graph.io(). Here's some
problems with it from a user's perspective:

1. The API is terrifying - there are reasons why this API developed the way
that it did, but I won't dwell on them. The point is that it's generally
hard to use with all the Builder patterns layering in and out of each
other. This is especially true if you have to customize some aspect of the
configuration in some way.
2. You can't use it with GLVs.
3. It forces reliance on the Graph API which we try to quietly hide from
users and some graphs don't even support that aspect of the API directly

>From the graph provider perspective:

1. If you don't have a Graph API you have no way to support this which is
bad because users look at the TinkerPop docs, and see this io() capability
all over the place and it causes confusion
2. You have no way to optimize the loading process.

I won't even go into all the internal reasons I dislike Graph.io() - hehe

So, after careful thought, I think that deprecation makes sense, however, I
don't think we want to completely lose the convenience of Graph.io().

"But you said that we would delegate all bulk loading to provider tools!"

Well, yes, and I think that idea still holds with the caveat that in
providing this convenience we make it at least possible for a graph
provider to optimize on it. So, i'm proposing that we make "io()" a part of
GraphTraversalSource by adding:

g.read(resource)
g.write(resource)

where resource is a string representing a URL to some file location. We
might have other overloads here, but I'm not sure what those are yet. These
methods would essentially be new steps to the Gremlin language and thus
allow additional configurations to be passed using our new with() step. By
default, we can rely on OLTP style reading/writing so this will work for
any graph system.

Now, graph providers already know what they would need to do here to
optimize - they would write a strategy to detect a ReadStep or WriteStep
and replace it with their own. An extraordinary example of this will fit in
perfectly with HadoopGraph. HadoopGraph should be able to have a strategy
that detects these steps and then replaces them with VertexProgramStep
containing a CloneVertexProgram. See what that means? We've just unified
Gremlin OLTP bulk read/write with OLAP. That's something we've been
thinking about for years.

Another really cool thing - g.read() and g.write() will work with GLVs out
of the box! The only trick would be that the resource being loaded is
relative to the server not to the client. I really like that things work
out this way, because we further unify the scope of Gremlin functions under
a single syntax and further reduce the reliance on the ScriptEngine for
non-JVM languages.

Just to be clear here - we really shouldn't be losing any functionality
with this change. I'd see us deprecating Graph.io() on the 3.4.0 line so it
would be a good long while before we tried to completely remove that method
and related infrastructure.

Please let me know if there are any questions or concerns. If there are no
objections, I'll start to move forward in this direction and see what kinds
of problems I run into.

[DISCUSS] Graph.io() deprecation

Reply via email to