RE: [DSISCUSS] Remote Transactions

Kelvin Lawrence Sat, 20 Mar 2021 07:34:49 -0700

Just a quick comment to add to what others have said. This functionality has 
been a common ask by users who do not want to have to fall back to sending text 
queries just to get the semantics of a session. Thanks for taking this on. It 
will be a nice addition to the Gremlin client/server interaction model. The 
syntax you proposed looks good to me. I assume there will also be an explicit 
rollback ?


Cheers,
Kelvin

Kelvin R. Lawrence
http://www.kelvinlawrence.net
https://www.linkedin.com/in/krlawrence/
Twitter: @gfxman

From: Stephen Mallette
Sent: Thursday, March 18, 2021 7:16 PM
To: [email protected]
Subject: Re: [DSISCUSS] Remote Transactions

I've created this issue for tracking:

https://issues.apache.org/jira/browse/TINKERPOP-2537

and have pushed a branch for early review:

https://github.com/apache/tinkerpop/tree/TINKERPOP-2537

The server-side is a bit of a copy/paste mess at the moment, so try not to
pay attention there too much and focus more on the driver and Transaction
interfaces side of things. At this point I'm amazed at how easily this fell
together. It feels like I'm almost overlooking something important as I'd
not have guessed this to go in so straightforwardly. The hardest part was
figuring out the interaction among the various interfaces to get the user
API right but once that was untangled, stuff just started working.

At this point, I think I need to pay some attention to the server-side
to make this pull request ready. I basically cut/paste a giant grip of code
from TraversalOpProcessor into SessionOpProcessor and tweaked a few lines
here and there. Not terribly ideal, but it was the fastest way to be sure
that the general premise of the client-side was going to actually work. Now
the question is how to make the server part nice, but refactoring there
isn't traditionally easy unfortunately.




On Thu, Mar 18, 2021 at 12:53 PM Stephen Mallette <[email protected]>
wrote:

> First of all, I managed to fumble my last code sample - sorry. You can
> either use the Transaction object explicitly like this:
>
> g = traversal().withRemote(conn)
> tx = g.tx() // Transaction object
> gtx = tx.begin()  // GraphTraversalSource
> assert tx.isOpen() == true
> gtx.addV('person').iterate()
> gtx.addV('software').iterate()
> tx.commit() // alternatively you could explicitly rollback()
> assert tx.isOpen() == false
>
> Or you can look more like the old style:
>
> g = traversal().withRemote(conn)
> gtx = g.tx().begin()  // GraphTraversalSource
> assert gtx.tx().isOpen() == true
> gtx.addV('person').iterate()
> gtx.addV('software').iterate()
> gtx.tx().commit() // alternatively you could explicitly rollback()
> assert gtx.tx().isOpen() == false
>
> Both look fairly fluid I think and I'm especially pleased with the second
> one. Need to do a bit more to prevent weird nested transactions and such
> but pretty neat for a just a day's worth of hacking about.
>
> Answers to Divij's questions inline below:
>
> On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <[email protected]>
> wrote:
>
>> Great idea Stephen. I agree with standardizing the explicit transaction
>> semantics in cases of remote server (implicit aka sessionless is already
>> self explanatory) and unifying the syntax with embedded graph usage.
>>
>> Couple of notes:
>>
>> 1. I would favor the begin() instead of create() as it's closer to
>> universally prominent SQL transaction syntax. This would reduce the
>> learning curve for users of TP.
>>
>
> I'm ok with begin().
>
>
>> 2. From an implementation standpoint, sessionless requests are queued on
>> the server side and are serviced by the worker thread pool. But a
>> session-ful aka explicit transaction aka managed transaction starts a new
>> single worked thread pool every time. In a scenario where we have both
>> session-less and sesion-ful requests being made to the server, the
>> resource
>> allocation will not be fair and may lead to starvation for some requests.
>> This problem exists even today, hence not totally correlated to what you
>> are proposing but I am afraid a wider adoption of explicit
>> transactions will bring this problem to the spotlight. Hence, we should
>> fix
>> this problem first. A solution would entail converging the worker pool for
>> both session vs session-less requests.
>>
>> 3. You are proposing the idea of having a transaction scoped traversal
>> object. I agree with it but we need more clarification in behavior wrt the
>> following scenarios:
>>
>> Q. What happens when g.tx().commit() is called on a transaction scoped
>> traversal object? Does the traversal get closed?
>>
>
> If you call gtx.tx().commit() then gtx is dead. If you try do another
> traversal with that it will throw an exception. If you call gtx.close() it
> will close the session and then whatever default handling by the server
> will come into play to commit or rollback. if you call g.tx().commit(),
> meaning on the "parent" to gtx, then you basically send a commit operation
> to the server and that's that.
>
> Q. Currently, the same traversal object could be used to execute multiple
>> session-less requests simultaneously in a thread safe manner. Does the
>> same
>> behaviour apply to transaction scoped traversal?
>
>
> Yes. I would expect similar behavior there, though as you note below they
> will just get processed in the server serially as they arrive and not in
> parallel.
>
>
>> If not, then how do I send
>> multiple requests in parallel from the client all scoped to the same
>> transaction on the server? Note that this is different from case of multi
>> threaded transactions because on the server all requests are scoped to
>> single transaction i.e. single thread but on the client they may be
>> submitted via multiple threads.
>>
>> Regards,
>> Divij Vaidya
>>
>>
>>
>> On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <[email protected]>
>> wrote:
>>
>> > The Transaction object is a bit important with remote cases because it
>> > holds the reference to the session. With embedded use cases we generally
>> > adhere to ThreadLocal transactions so the Transaction implementation in
>> > that case is more of a controller for the current thread but for remote
>> > cases the Transaction implementation holds a bit of state that can cross
>> > over threads. In some ways, that makes remote cases feel like a
>> "threaded
>> > transaction" so that may be familiar to users?? Here's some example
>> > syntax I currently have working in a test case:
>> >
>> > g = traversal().withRemote(conn)
>> > gtx = g.tx().create()
>> > assert gtx.isOpen() == true
>> > gtx.addV('person').iterate()
>> > gtx.addV('software').iterate()
>> > gtx.commit() // alternatively you could explicitly rollback()
>> > assert gtx.isOpen() == false
>> >
>> > I hope that documentation changes are enough to unify the syntax and
>> remove
>> > confusion despite there still being ThreadLocal transactions as a
>> default
>> > for embedded cases and something else for remote. At least they will
>> look
>> > the same, even if you technically don't need to do a g.tx().create() for
>> > embedded transactions and can just have the control you always had with
>> > g.tx().commit() directly.
>> >
>> > Note that the remote Transaction object can support configuration via
>> > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the default
>> of
>> > "commit on close" to rollback or manual (or i suppose something custom).
>> > It's nice that this piece works. I don't see a point in supporting
>> > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in remote
>> > contexts.
>> >
>> > Another point is that I've added what I termed a "Graph Operation"
>> bytecode
>> > instances which is bytecode that isn't related to traversal
>> construction. I
>> > hope this isn't one of those things where we end up wanting to
>> deprecate an
>> > idea as fast as we added it but we needed some way to pass
>> commit/rollback
>> > commands and they aren't really part of traversal construction. They are
>> > operations that execute on the graph itself and I couldn't think of how
>> to
>> > treat them as traversals nicely, so they sorta just exist as a one off.
>> > Perhaps they will grow in number?? Folks have always asked if bytecode
>> > requests could get "GraphFeature" information - I suppose this could be
>> a
>> > way to do that??
>> >
>> > Anyway, I will keep going down this general path as it's appearing
>> > relatively fruitful.
>> >
>> >
>> >
>> >
>> > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <[email protected]>
>> > wrote:
>> >
>> > > > We should just communicate clearly that a simple remote traversal
>> > > already uses a transaction by default,
>> > >
>> > > That's a good point because even with this change we still have a
>> > > situation where a single iterated remote traversal will generally
>> mean a
>> > > server managed commit/rollback, but in embedded mode, we have an open
>> > > transaction (for transaction enabled graphs - TinkerGraph will behave
>> > more
>> > > like a remote traversal, so more confusion there i guess). I'm not
>> sure
>> > how
>> > > to rectify that at this time except by way of documentation.
>> > >
>> > > The only thing I can think of is that the default for embedded would
>> have
>> > > to be auto-commit per traversal. In that way it would work like remote
>> > > traversals and for graphs that don't support transactions like
>> > TinkerGraph.
>> > > Of course, that's the kind of change that will break a lot of code.
>> Maybe
>> > > we just keep that idea for another version.
>> > >
>> > > On Thu, Mar 18, 2021 at 4:21 AM <[email protected]> wrote:
>> > >
>> > >> I like the idea of adding transactions for remote traversals as they
>> > >> close a gap in functionality that we currently have for remote
>> > traversals.
>> > >> We should just communicate clearly that a simple remote traversal
>> > already
>> > >> uses a transaction by default, meaning that the server will roll back
>> > any
>> > >> changes if any exception occurs. Users often ask about how to do
>> > >> transactions with a remote traversal because they don't know about
>> this
>> > and
>> > >> I'm afraid that we might add even more confusion if we now add
>> > transactions
>> > >> for remote traversals. Hence why I think that we should communicate
>> this
>> > >> clearly when we add remote transactions.
>> > >>
>> > >> Getting this to work in the GLVs should be possible but will require
>> > some
>> > >> effort. I think we would have to introduce some
>> > >> "DriverRemoteTransactionTraversal" that doesn't submit the traversal
>> on
>> > a
>> > >> terminal step but saves it and then submits all saved traversals
>> > together
>> > >> on close().
>> > >> This also means that we should directly add an async version of
>> close(),
>> > >> maybe closeAsync()? (Same for commit() and rollback())
>> > >>
>> > >> I also like create() better than traversal() because it would
>> confuse me
>> > >> to first start a traversal with traversal() and then also start a
>> > >> transaction with the same method.
>> > >>
>> > >> -----Ursprüngliche Nachricht-----
>> > >> Von: Stephen Mallette <[email protected]>
>> > >> Gesendet: Donnerstag, 18. März 2021 00:01
>> > >> An: [email protected]
>> > >> Betreff: Re: [DSISCUSS] Remote Transactions
>> > >>
>> > >> One thing I should have noted about tx().create(). The create() very
>> > well
>> > >> could have been named traversal(), thus the previous example reading
>> as:
>> > >>
>> > >> g = traversal().withEmbedded(graph)
>> > >> // or
>> > >> g = traversal().withRemote(conn)
>> > >> gtx = g.tx().traversal()
>> > >> gtx.addV('person').iterate()
>> > >> gtx.addV('software').iterate()
>> > >> gtx.close() // alternatively you could explicitly commit() or
>> rollback()
>> > >>
>> > >> You're basically spawning a new GraphTraversalSource from the
>> > Transaction
>> > >> object rather than from a Graph or AnonymousTraversal. I chose
>> create()
>> > >> because it felt like this looked weird:
>> > >>
>> > >> g = traversal().withRemote(conn).tx().traversal()
>> > >>
>> > >> which would be a weird thing to do I guess, but just seeing
>> > "traversal()"
>> > >> over and over seemed odd looking and I wanted to differentiate with
>> the
>> > >> Transaction object even though the methods are identical in what they
>> > do. I
>> > >> suppose I also drew inspiration from:
>> > >>
>> > >> Transaction.createdThreadedTx()
>> > >>
>> > >> which I think we might consider deprecating. JanusGraph would simply
>> > >> expose their own Transaction object in the future that has that
>> method
>> > as I
>> > >> imagine folks still use that feature. As far as I know, no other
>> graph
>> > >> implements that functionality and my guess is that no one will likely
>> > do so
>> > >> in the future.
>> > >>
>> > >> anyway, if you like traversal() better than create() or the other way
>> > >> around, please let me know
>> > >>
>> > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>> [email protected]>
>> > >> wrote:
>> > >>
>> > >> > I am in favor of this change as I any idea that unifies the
>> multiple
>> > >> > different ways things work in TP will only make it easier to learn
>> and
>> > >> > adopt.
>> > >> >
>> > >> > I don't know that I have enough knowledge of the inner workings of
>> > >> > transactions to know what/if this will cause problems.
>> > >> >
>> > >> > Dave
>> > >> >
>> > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>> > >> > <[email protected]>
>> > >> > wrote:
>> > >> >
>> > >> > > We haven't touched "transactions" since TP3 was originally
>> designed.
>> > >> > > They have remained a feature for embedded use cases even in the
>> face
>> > >> > > of the
>> > >> > rise
>> > >> > > of remote graph use cases and result in major inconsistencies
>> that
>> > >> > > really bother users.
>> > >> > >
>> > >> > > As we close on 3.5.0, I figured that perhaps there was a chance
>> to
>> > >> > > do something radical about transactions. Basically, I'd like
>> > >> > > transactions to work irrespective of remote or embedded usage and
>> > >> > > for them to have the
>> > >> > same
>> > >> > > API when doing so. In mulling it over for the last day or so I
>> had a
>> > >> > > realization that makes me believe that the following is possible:
>> > >> > >
>> > >> > > g = traversal().withEmbedded(graph)
>> > >> > > // or
>> > >> > > g = traversal().withRemote(conn)
>> > >> > > gtx = g.tx().create()
>> > >> > > gtx.addV('person').iterate()
>> > >> > > gtx.addV('software').iterate()
>> > >> > > gtx.close() // alternatively you could explicitly commit() or
>> > >> > > rollback()
>> > >> > >
>> > >> > > // you could still use g for sessionless, but gtx is "done"
>> after //
>> > >> > > you close so you will need to create() a new gtx instance for a
>> //
>> > >> > > fresh transaction assert 2 == g.V().count().next()
>> > >> > >
>> > >> > > Note that the create() method on tx() is the new bit of necessary
>> > >> syntax.
>> > >> > > Technically, it is a do-nothing for embedded mode (and you could
>> > >> > > skip it for thread-local auto-transactions) but all the
>> > >> > > documentation can be shifted so that the API is identical to
>> remote.
>> > >> > > The change would be non-breaking as the embedded transaction
>> > >> > > approach would remain as it is, but would no longer be
>> documented as
>> > >> > > the preferred approach. Perhaps we could one day disallow it but
>> > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we want
>> to
>> > >> touch in TP3.
>> > >> > >
>> > >> > > What happens behind the scenes of g.tx().create() is that in
>> remote
>> > >> > > cases the RemoteConnection constructs a remote Transaction
>> > >> > > implementation which basically is used to spawn new traversal
>> source
>> > >> > > instances with a session based connections. The remote
>> Transaction
>> > >> > > object can't support
>> > >> > transaction
>> > >> > > listeners or special read-write/close configurations, but I don't
>> > >> > > think that's a problem as those things don't make a lot of sense
>> in
>> > >> > > remote use cases.
>> > >> > >
>> > >> > > From the server perspective, this change would also mean that
>> > >> > > sessions would have to accept bytecode and not just scripts,
>> which
>> > >> > > technically shouldn't be a problem.
>> > >> > >
>> > >> > > One downside at the moment is that i'm thinking mostly in Java at
>> > >> > > the moment. I'm not sure how this all works in other variants
>> just
>> > >> > > yet, but
>> > >> > I'd
>> > >> > > hope to keep similar syntax.
>> > >> > >
>> > >> > > I will keep experimenting tomorrow. If you have any thoughts on
>> the
>> > >> > matter,
>> > >> > > I'd be happy to hear them. Thanks!
>> > >> > >
>> > >> >
>> > >>
>> > >>
>> >
>>
>

RE: [DSISCUSS] Remote Transactions

Reply via email to