Re: [DISCUSS] Graph.io() deprecation

Stephen Mallette Fri, 13 Jul 2018 03:11:14 -0700

Just an update to say that I've made progress on this concept for

https://issues.apache.org/jira/browse/TINKERPOP-1996


Still a few things left to do, but I have the changes in place and some
tests and it works well. I did have to alter the API slightly to look more
like:

g.io('file.kryo').read()
g.io('file.json').write()

In this way, read()/write() behave more like termination steps which allows
the use of with() as a configuration step to io() and enables
self-iteration. I have this working for both OLTP and OLAP which means that
TinkerPop has finally bridged that annoying usage gap and we should thus be
able to close:

https://issues.apache.org/jira/browse/TINKERPOP-550

For OLAP we basically replace the step created via g.io() with one that
uses the CloneVertexProgram allowing users to alter the configurations
(i.e. the Input/OutputFormat) for IO. There are still some odds and ends to
work out, but so far this is looking pretty good.



On Fri, Jun 29, 2018 at 2:13 PM Stephen Mallette <[email protected]>
wrote:

> So, we made the change on:
>
> https://issues.apache.org/jira/browse/TINKERPOP-1985
>
> to deprecate BLVP because we're going to delegate bulk loading to graph
> providers. The issue I now see is that Graph.io() is also a form of bulk
> loading which means it should probably go the route of deprecation if we're
> following our thinking to the letter. But, I don't think that "bulk
> loading" is the only reason to move on from Graph.io(). Here's some
> problems with it from a user's perspective:
>
> 1. The API is terrifying - there are reasons why this API developed the
> way that it did, but I won't dwell on them. The point is that it's
> generally hard to use with all the Builder patterns layering in and out of
> each other. This is especially true if you have to customize some aspect of
> the configuration in some way.
> 2. You can't use it with GLVs.
> 3. It forces reliance on the Graph API which we try to quietly hide from
> users and some graphs don't even support that aspect of the API directly
>
> From the graph provider perspective:
>
> 1. If you don't have a Graph API you have no way to support this which is
> bad because users look at the TinkerPop docs, and see this io() capability
> all over the place and it causes confusion
> 2. You have no way to optimize the loading process.
>
> I won't even go into all the internal reasons I dislike Graph.io() - hehe
>
> So, after careful thought, I think that deprecation makes sense, however,
> I don't think we want to completely lose the convenience of Graph.io().
>
> "But you said that we would delegate all bulk loading to provider tools!"
>
> Well, yes, and I think that idea still holds with the caveat that in
> providing this convenience we make it at least possible for a graph
> provider to optimize on it. So, i'm proposing that we make "io()" a part of
> GraphTraversalSource by adding:
>
> g.read(resource)
> g.write(resource)
>
> where resource is a string representing a URL to some file location. We
> might have other overloads here, but I'm not sure what those are yet. These
> methods would essentially be new steps to the Gremlin language and thus
> allow additional configurations to be passed using our new with() step. By
> default, we can rely on OLTP style reading/writing so this will work for
> any graph system.
>
> Now, graph providers already know what they would need to do here to
> optimize - they would write a strategy to detect a ReadStep or WriteStep
> and replace it with their own. An extraordinary example of this will fit in
> perfectly with HadoopGraph. HadoopGraph should be able to have a strategy
> that detects these steps and then replaces them with VertexProgramStep
> containing a CloneVertexProgram. See what that means? We've just unified
> Gremlin OLTP bulk read/write with OLAP. That's something we've been
> thinking about for years.
>
> Another really cool thing - g.read() and g.write() will work with GLVs out
> of the box! The only trick would be that the resource being loaded is
> relative to the server not to the client. I really like that things work
> out this way, because we further unify the scope of Gremlin functions under
> a single syntax and further reduce the reliance on the ScriptEngine for
> non-JVM languages.
>
> Just to be clear here - we really shouldn't be losing any functionality
> with this change. I'd see us deprecating Graph.io() on the 3.4.0 line so it
> would be a good long while before we tried to completely remove that method
> and related infrastructure.
>
> Please let me know if there are any questions or concerns. If there are no
> objections, I'll start to move forward in this direction and see what kinds
> of problems I run into.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [DISCUSS] Graph.io() deprecation

Reply via email to