Re: [DISCUSS] Graph.io() deprecation

Stephen Mallette Thu, 19 Jul 2018 10:38:39 -0700

I had to drop the idea of using read()/write() as termination steps. It
just didn't work given how much we rely on the existing iterate()
functionality in our infrastructure. With read()/write() as termination
steps the traversal executes on construction when built from bytecode and i
didn't want to build a bunch of new stuff to change that and it felt hacky
for me to do a lot of if-then work to carve out space for them to work the
way I'd previously described. So, we're basically at the same syntax now,
it's just that you need to iterate() your read()/write() traversals which
looks a little weird, but perhaps we'll get used to it. I didn't bother to
drop the io() step because I sorta like that it creates a single step to
reason about and that it is reminiscent of graph.io() so we can kinda keep
some consistency in the API there.



On Fri, Jul 13, 2018 at 6:11 AM Stephen Mallette <[email protected]>
wrote:

> Just an update to say that I've made progress on this concept for
>
> https://issues.apache.org/jira/browse/TINKERPOP-1996
>
> Still a few things left to do, but I have the changes in place and some
> tests and it works well. I did have to alter the API slightly to look more
> like:
>
> g.io('file.kryo').read()
> g.io('file.json').write()
>
> In this way, read()/write() behave more like termination steps which
> allows the use of with() as a configuration step to io() and enables
> self-iteration. I have this working for both OLTP and OLAP which means that
> TinkerPop has finally bridged that annoying usage gap and we should thus be
> able to close:
>
> https://issues.apache.org/jira/browse/TINKERPOP-550
>
> For OLAP we basically replace the step created via g.io() with one that
> uses the CloneVertexProgram allowing users to alter the configurations
> (i.e. the Input/OutputFormat) for IO. There are still some odds and ends to
> work out, but so far this is looking pretty good.
>
>
>
> On Fri, Jun 29, 2018 at 2:13 PM Stephen Mallette <[email protected]>
> wrote:
>
>> So, we made the change on:
>>
>> https://issues.apache.org/jira/browse/TINKERPOP-1985
>>
>> to deprecate BLVP because we're going to delegate bulk loading to graph
>> providers. The issue I now see is that Graph.io() is also a form of bulk
>> loading which means it should probably go the route of deprecation if we're
>> following our thinking to the letter. But, I don't think that "bulk
>> loading" is the only reason to move on from Graph.io(). Here's some
>> problems with it from a user's perspective:
>>
>> 1. The API is terrifying - there are reasons why this API developed the
>> way that it did, but I won't dwell on them. The point is that it's
>> generally hard to use with all the Builder patterns layering in and out of
>> each other. This is especially true if you have to customize some aspect of
>> the configuration in some way.
>> 2. You can't use it with GLVs.
>> 3. It forces reliance on the Graph API which we try to quietly hide from
>> users and some graphs don't even support that aspect of the API directly
>>
>> From the graph provider perspective:
>>
>> 1. If you don't have a Graph API you have no way to support this which is
>> bad because users look at the TinkerPop docs, and see this io() capability
>> all over the place and it causes confusion
>> 2. You have no way to optimize the loading process.
>>
>> I won't even go into all the internal reasons I dislike Graph.io() - hehe
>>
>> So, after careful thought, I think that deprecation makes sense, however,
>> I don't think we want to completely lose the convenience of Graph.io().
>>
>> "But you said that we would delegate all bulk loading to provider tools!"
>>
>> Well, yes, and I think that idea still holds with the caveat that in
>> providing this convenience we make it at least possible for a graph
>> provider to optimize on it. So, i'm proposing that we make "io()" a part of
>> GraphTraversalSource by adding:
>>
>> g.read(resource)
>> g.write(resource)
>>
>> where resource is a string representing a URL to some file location. We
>> might have other overloads here, but I'm not sure what those are yet. These
>> methods would essentially be new steps to the Gremlin language and thus
>> allow additional configurations to be passed using our new with() step. By
>> default, we can rely on OLTP style reading/writing so this will work for
>> any graph system.
>>
>> Now, graph providers already know what they would need to do here to
>> optimize - they would write a strategy to detect a ReadStep or WriteStep
>> and replace it with their own. An extraordinary example of this will fit in
>> perfectly with HadoopGraph. HadoopGraph should be able to have a strategy
>> that detects these steps and then replaces them with VertexProgramStep
>> containing a CloneVertexProgram. See what that means? We've just unified
>> Gremlin OLTP bulk read/write with OLAP. That's something we've been
>> thinking about for years.
>>
>> Another really cool thing - g.read() and g.write() will work with GLVs
>> out of the box! The only trick would be that the resource being loaded is
>> relative to the server not to the client. I really like that things work
>> out this way, because we further unify the scope of Gremlin functions under
>> a single syntax and further reduce the reliance on the ScriptEngine for
>> non-JVM languages.
>>
>> Just to be clear here - we really shouldn't be losing any functionality
>> with this change. I'd see us deprecating Graph.io() on the 3.4.0 line so it
>> would be a good long while before we tried to completely remove that method
>> and related infrastructure.
>>
>> Please let me know if there are any questions or concerns. If there are
>> no objections, I'll start to move forward in this direction and see what
>> kinds of problems I run into.
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>

Re: [DISCUSS] Graph.io() deprecation

Reply via email to