Re: [DISCUSS] Returning Side Effects

Stephen Mallette Thu, 28 Jul 2016 16:13:01 -0700

I have a rough cut of "returning side-effects" working on TINKERPOP-1278
branch. I didn't bother making this change for REST at this time as I felt
like it was more important and useful to have it run for websockets/NIO as
the drivers that would ultimately power a RemoteConnection are generally
written for that interface.


gremlin> graph = RemoteGraph.open('conf/remote-graph.properties')
==>remotegraph[DriverServerConnection-localhost/127.0.0.1:8182 [graph=g]]
gremlin>  g = graph.traversal()
==>graphtraversalsource[remotegraph[DriverServerConnection-localhost/
127.0.0.1:8182 [graph=g]], standard]
gremlin> t = g.V(1).aggregate('a').outE("knows").aggregate("b").inV()
==>v[2]
==>v[4]
gremlin> t.getSideEffects().get('a')
==>v[1]
gremlin> t.getSideEffects().get('b')
==>e[7][1-knows->2]
==>e[8][1-knows->4]

It was more effort than i expected to get this to work mostly because of my
attempts to do it all without breaking change. It was also interesting (and
nice) to see that the protocol didn't need to change structurally for this
to work, however, drivers will need to adjust a bit to deal with the
side-effects now streaming back following results. Note that this only
matters for those drivers who support submitting Traversals as Bytecode
(which I assume is "none" of them) and existing script submissions should
still have he same behavior and thus a terminating stream with the final
result (side effects left on the server as always).

To allow for side-effects to come back I added two pieces of metadata to a
ResponseMessage:

1. sideEffect - which is the value of the side effect key. for instance in
the above example, there would be values for "a" and "b" at different
points in the stream
2. aggregateTo - which will be one of map, list, bulkset, or none. the
significance here is that we needed a way to to tell the client how a batch
of results should be re-assembled. recall that Gremlin Server iterates
everything. If you return a String it puts the String into an Iterator for
the response. There needed to be a way to say that a particular sideeffect
was converted to iterator so that it could be re-assembled (or not) to what
the original type was.

As for the streaming model, Gremlin Server iterates the results first and
then the side effects by key. Recall that a ResponseMessage batches up
results returned from the server based on iteration size. I've arranged it
so that a ResponseMessage will never mix results with side effects or one
side-effect key with another key. In this way, it's easy to tie the
sideEffect/aggregateTo values to the data within the message. That made it
pretty easy for me to assemble the stream of side-effects into something
useful on the client side.

There is still a lot to do here:

1. Lots of code cleanup to say the least - Some of the basic interfaces,
classes, etc that i added may see some change as i review with a fresh mind
tomorrow.
2. I'd like to make it optional to return side-effects so that drivers or
users can choose to opt-out of the expense of sending that information back
if it isn't needed somehow
3. Piggy-backing on 2, as mentioned earlier in this thread, i think it
would be nice if you could actively state as a user which side-effects you
wanted sent back when you submit the traversal. not sure where that would
be specified right now given the way everything is hooked together.
4. Documentation is non-existent at this point beyond what i've tried to
lay out in this thread so I gotta get to that when all the change settles
down. I assume that won't happen until Marko gets back from his time off as
I suspect he'll think of a few extra things to do in making this all work
well.

Anyway, please let me know if there are any thoughts on this approach.





On Fri, Jul 22, 2016 at 6:24 PM, Stephen Mallette <[email protected]>
wrote:

> Yes, I expected to return results first and then stream the side-effects.
>
> On Fri, Jul 22, 2016 at 5:05 PM, Dylan Millikin <[email protected]>
> wrote:
>
>> > Perhaps nicer than doing all that trickery with transactions would be to
>> self-detach the vertex ahead of time
>>
>> This was the original idea, I never dove too deep into it as the
>> sideEffects were applied mid traversal and extra filtering/SEs still had
>> to
>> occur. I wasn't sure it was actually possible and the transaction hack
>> allowed me to move on.
>>
>> As for the GLV limitations, it's mostly going to be network overhead.
>> Unfortunately one round trip with the server is costly and I know that
>> we've ended up having to be creative in order to limit the round trips by
>> concatenating scripts for each query. A GLV approach would need some
>> careful planing and probably a multiline byteCode feature. But I digress
>> that's not what this thread is about.
>>
>> In the spirit of GLVs returning side effects how would your original
>> proposition stream over the network? Would you get all data first and then
>> SE? I'm guessing you would want to stream the SEs as well.
>>
>> On Fri, Jul 22, 2016 at 4:42 PM, Stephen Mallette <[email protected]>
>> wrote:
>>
>> > > You can take the case of a group count as a really simple example.
>> >
>> > So you want the side-effect in the Vertex itself so you can use it with
>> the
>> > ORM. Interesting. Perhaps nicer than doing all that trickery with
>> > transactions would be to self-detach the vertex ahead of time (i.e.
>> create
>> > a DetachedVertex) and add the property you want. As indirect as that
>> > sounds, that seems more direct to me than the "fake" transaction. Not
>> sure
>> > that what I'm doing here will help you with that problem.
>> >
>> > > I'll add that I'm looking at this from a non-GLV perspective so I'm
>> > disregarding object mapping done through GraphSONv2.0 typing in favor
>> of a
>> > format guarantied result set (say that either only contains vertices,
>> >  edges, or a combination of both).
>> >
>> > Also interesting. Not sure that kind of serialization has a place in
>> > TinkerPop where we encourage folks to return everything under the sun by
>> > using Gremlin to return data in a form that suits their required end
>> > result. if this is the outcome you want, I think that my suggestion with
>> > self-detaching is probably on the right track. Maybe consider a custom
>> > serializer that coerces all results to a graph elements. That would take
>> > care of all the embedded objects and the whole lot.
>> >
>> > > The reason for this is that GLV is too
>> > inefficient for larger projects so a more traditional script->result
>> > approach is required.
>> >
>> > I'm hijacking my own thread by going too deep down this path, but I
>> think
>> > we should strive toward a solution for GLVs to be robust enough for
>> > developers to be successful with TinkerPop in the language of their
>> choice.
>> > Just like we'll never get rid of all lambdas in Gremlin, we will
>> probably
>> > never quite get rid of script->result for all use cases (but, again,
>> like
>> > lambdas the goal will be to get quite close). I find it quite
>> interesting
>> > that we might be able to figure out how a python dev could write
>> Gremlin in
>> > python that would remotely execute on the server seamlessly, however
>> it's
>> > also interesting that that same GLV code could be treated as
>> server-side to
>> > be accessed by from a python client. In that way, heavy complex logic
>> (the
>> > type you are talking about) could be written in python and then accessed
>> > from python on the client. In short, i think that it would be better to
>> > prefer to think of the work around GLVs as "how to make Gremlin good in
>> > other languages" rather than the more narrow view of just "remoting
>> > traversals".  If we go wider, we might come up with some good ideas to
>> > really broaden access to TinkerPop and graphs in a very big way.
>> >
>> > We already have a really big improvement with "remoting" as compared to
>> > good 'ol RexsterGraph - so that's something  - haha  ;)
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Fri, Jul 22, 2016 at 3:17 PM, Dylan Millikin <
>> [email protected]>
>> > wrote:
>> >
>> > > Yeah sorry I left out an important part. This is especially an issue
>> when
>> > > you're dealing with an ORM layer that's expecting results of a
>> specific
>> > > type (for example vertices).
>> > > You can take the case of a group count as a really simple example.
>> Your
>> > > result set could be :
>> > >
>> > > [{count:5, vertex:v[1]}, {count:3, vertex:v[2]}, {count:1,
>> vertex:v[3]}]
>> > > and this is easy enough to do with gremlin. But unless this is built
>> into
>> > > the ORM itself chances are you'll need to implement the object mapping
>> > > yourself.
>> > >
>> > > The alternative is to add "count" as a property of vertex and then you
>> > can
>> > > leverage all available features from your ORM such as filtering,
>> > ordering,
>> > > etc... Actually, the way we did it above we can also do those
>> directly in
>> > > gremlin as well.
>> > >
>> > > This is a simple case, but once it gets more complicated with
>> > hierarchical
>> > > data, the option of implementing the object mapping yourself is just a
>> > > headache and often times less efficient than just rolling back a
>> > > transaction.
>> > >
>> > > Dunno if that was clear enough this time around.
>> > >
>> > > I'll add that I'm looking at this from a non-GLV perspective so I'm
>> > > disregarding object mapping done through GraphSONv2.0 typing in favor
>> of
>> > a
>> > > format guarantied result set (say that either only contains vertices,
>> > >  edges, or a combination of both). The reason for this is that GLV is
>> too
>> > > inefficient for larger projects so a more traditional script->result
>> > > approach is required.
>> > >
>> > > On Fri, Jul 22, 2016 at 2:09 PM, Stephen Mallette <
>> [email protected]>
>> > > wrote:
>> > >
>> > > > hi dylan, could you please provide a more concrete example of the
>> > problem
>> > > > you're facing?
>> > > >
>> > > > On Fri, Jul 22, 2016 at 1:24 PM, Dylan Millikin <
>> > > [email protected]>
>> > > > wrote:
>> > > >
>> > > > > I'm going to confirm that this is actually a common issue.
>> > > > > One thing to keep in mind is that often times the sideEffects are
>> > > > directly
>> > > > > linked to returned elements on a 1 --> n basis which neither of
>> the
>> > > above
>> > > > > really help with. That is to say that if you're streaming your
>> > results
>> > > > > you'll need the sideEffects that relate to the streamed element.
>> > > > >
>> > > > > There is no easy way of handling this currently. Especially if you
>> > > order
>> > > > > your results and get unordered sideEffect results.
>> > > > > One way we've found to work around this is very hacky, not
>> efficient
>> > > and
>> > > > > only works for non mutating queries:
>> > > > >
>> > > > > - we start a transaction
>> > > > > - we append the sideEffect data to the elements we're emitting
>> (say
>> > as
>> > > > > properties of a vertex)
>> > > > > - get the full result set with sideEffects as properties of the
>> > result
>> > > > > elements.
>> > > > > - rollback transaction so properties are not persisted to the
>> graph.
>> > > > >
>> > > > > A truly wicked succession of events born from absolute
>> desperation.
>> > > > > I enquired a while back about the ability to treat elements as
>> > detached
>> > > > > from the graph in order to do the above without the transaction
>> > > handling.
>> > > > > But I never followed up.
>> > > > >
>> > > > > I figured I would put this out there as another case where
>> non-Java
>> > > > > languages struggle.
>> > > > >
>> > > > > On Thu, Jul 21, 2016 at 1:19 PM, Stephen Mallette <
>> > > [email protected]>
>> > > > > wrote:
>> > > > >
>> > > > > > Your way made me think that if you wrote your traversal like
>> that,
>> > > you
>> > > > > > would return the side-effects twice - once in your traversal as
>> > part
>> > > of
>> > > > > the
>> > > > > > standard result and then again as a side-effect.  Not sure what
>> > that
>> > > > > means
>> > > > > > - just a thought.
>> > > > > >
>> > > > > > While I'm thinking thoughts that may or may not be obvious, it
>> also
>> > > > > occurs
>> > > > > > to me that the downside for a GLV retrieving data that way is
>> that
>> > > the
>> > > > > > result of the traversal won't be streamed back. It will
>> aggregate
>> > the
>> > > > > > result (and the side-effects naturally) in memory and then
>> return
>> > > that
>> > > > > all
>> > > > > > as a whole.
>> > > > > >
>> > > > > > On Thu, Jul 21, 2016 at 11:24 AM, Daniel Kuppitz
>> <[email protected]>
>> > > > > wrote:
>> > > > > >
>> > > > > > > If you really want to have your result and your side-effects
>> > > returned
>> > > > > by
>> > > > > > a
>> > > > > > > single request, you could do something like this:
>> > > > > > >
>> > > > > > > gremlin>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().as("data").select("data",
>> > > > > > > "names", "ages")*
>> > > > > > > ==>[data:[v[1], v[2], v[4]], names:[marko, vadas, josh],
>> > ages:[29,
>> > > > 27,
>> > > > > > 32]]
>> > > > > > > gremlin>
>> > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> g.V(1,2,4).aggregate("names").by("name").aggregate("ages").by("age")*.fold().project("data",
>> > > > > > > "se").by().by(cap("names","ages"))*
>> > > > > > > ==>[data:[v[1], v[2], v[4]], se:[names:[marko, vadas, josh],
>> > > > ages:[29,
>> > > > > > 27,
>> > > > > > > 32]]]
>> > > > > > > gremlin>
>> > > > > g.V(1,2,4).aggregate("names").by("name")*.fold().project("data",
>> > > > > > > "se").by().by(cap("names"))*
>> > > > > > > ==>[data:[v[1], v[2], v[4]], se:[marko, vadas, josh]]
>> > > > > > >
>> > > > > > > I'm not saying it would be bad to have Gremlin Server handle
>> that
>> > > for
>> > > > > > you,
>> > > > > > > just wanted to show that it's actually pretty easy to get the
>> > data
>> > > > and
>> > > > > > the
>> > > > > > > side-effects without using the traversal admin methods (hence
>> it
>> > > > should
>> > > > > > > work for all GLVs).
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Daniel
>> > > > > > >
>> > > > > > >
>> > > > > > > On Thu, Jul 21, 2016 at 4:51 PM, Stephen Mallette <
>> > > > > [email protected]>
>> > > > > > > wrote:
>> > > > > > >
>> > > > > > > > As we look to build out GLVs and expand Gremlin into other
>> > > > > programming
>> > > > > > > > languages, one of the important aspects of doing this
>> should be
>> > > to
>> > > > > > > consider
>> > > > > > > > consistency across GLVs. We should try to prevent
>> capabilities
>> > of
>> > > > > Java
>> > > > > > > from
>> > > > > > > > being lost in Python, JS, etc.
>> > > > > > > >
>> > > > > > > > As we look at both RemoteGraph in Java and gremlin-python we
>> > find
>> > > > > that
>> > > > > > > > there is no way to get traversal side-effects. If you write
>> a
>> > > > > Traversal
>> > > > > > > and
>> > > > > > > > want side-effects from it, you have to write your traversal
>> to
>> > > > return
>> > > > > > > them
>> > > > > > > > so that it comes back as part of the result set. Since
>> > > RemoteGraph
>> > > > > and
>> > > > > > > > gremlin-python don't really allow you to directly "submit a
>> > > script"
>> > > > > > it's
>> > > > > > > > not as though you can execute a traversal once for both the
>> > > result
>> > > > > and
>> > > > > > > the
>> > > > > > > > side-effect and package them together in a single request as
>> > you
>> > > > > might
>> > > > > > do
>> > > > > > > > with a simple script request:
>> > > > > > > >
>> > > > > > > > $ curl -X POST -d
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> "{\"gremlin\":\"t=g.V(1).values('name').aggregate('x');[v:t.toList(),se:t.getSideEffects().get('x')]\"}"
>> > > > > > > > http://localhost:8182
>> > > > > > > >
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>> {"requestId":"3d3258b2-e421-459a-bf53-ea1e58ece4aa","status":{"message":"","code":200,"attributes":{}},"result":{"data":[{"v":["marko"]},{"se":["marko"]}],"meta":{}}}
>> > > > > > > >
>> > > > > > > > I'm thinking that we could alter things in a non-breaking
>> way
>> > to
>> > > > > allow
>> > > > > > > > optional return of side-effect data so that there is a way
>> to
>> > > have
>> > > > > this
>> > > > > > > all
>> > > > > > > > streamed back without the need for the little workaround I
>> just
>> > > > > > > > demonstrated. For REST I think we could just include a
>> > sideEffect
>> > > > > > request
>> > > > > > > > parameter that allowed for a list of side-effect keys to
>> > return.
>> > > > > > Perhaps
>> > > > > > > > the a "*" could indicate that all should be returned.  the
>> > > > > side-effects
>> > > > > > > > could be serialized into a key sibling to "data" called
>> > > > "sideEffect".
>> > > > > > > >
>> > > > > > > > I think a similar approach could be used for websockets and
>> NIO
>> > > > where
>> > > > > > we
>> > > > > > > > could amend the protocol to accept that sideEffect
>> parameter.
>> > We
>> > > > > would
>> > > > > > > > first stream results (marked with meta data to specify a
>> > > "result")
>> > > > > and
>> > > > > > > then
>> > > > > > > > stream side effects (again marked with meta data as such).
>> > > > > > > >
>> > > > > > > > I considered caching the Traversal instances so that a
>> future
>> > > > request
>> > > > > > > could
>> > > > > > > > get the side effects, but for a variety of reasons I
>> abandoned
>> > > that
>> > > > > > (the
>> > > > > > > > cache meant more heap and trying to get the right balance,
>> new
>> > > > > > > transactions
>> > > > > > > > would have to be opened if the side-effect contained graph
>> > > > elements,
>> > > > > > > etc.)
>> > > > > > > >
>> > > > > > > > I like the approach of just maintaining our single
>> > > request-response
>> > > > > > model
>> > > > > > > > with the changes I proposed above.It seems to provide the
>> least
>> > > > > impact
>> > > > > > > with
>> > > > > > > > no new dependencies, is backward compatible and could be
>> > > completely
>> > > > > > > > optional to RemoteConnections.
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: [DISCUSS] Returning Side Effects

Reply via email to