Re: [DSISCUSS] Remote Transactions

Stephen Mallette Fri, 26 Mar 2021 05:08:09 -0700

I'd left this work temporarily to focus on the UnifiedChannelizer.
Ultimately, that body of work seems to satisfy the worries I had about the
copy/paste approach I took to get things working here with the old
OpProcessor stuff. Given that the UnifiedChannelizer will eventually
replace OpProcessor, I think we can live with the copy/paste approach until
the UnifiedChannelizer becomes the default, which I hope we can see firm up
over 3.5.x. If that's ok with everyone I think I will focus on firming up
documentation and testing for remote g.tx() and get a PR issued.


On Mon, Mar 22, 2021 at 6:14 AM Stephen Mallette <[email protected]>
wrote:

> >  I assume there will also be an explicit rollback ?
>
> yes, you could call rollback() directly (I think that's what you're
> asking):
>
> g = traversal().withRemote(conn)
> gtx = g.tx().create()
> assert gtx.isOpen() == true
> gtx.addV('person').iterate()
> gtx.addV('software').iterate()
> gtx.rollback() // alternatively you could explicitly commit()
> assert gtx.isOpen() == false
>
>
>
>
> On Sat, Mar 20, 2021 at 10:34 AM Kelvin Lawrence <[email protected]>
> wrote:
>
>> Just a quick comment to add to what others have said. This functionality
>> has been a common ask by users who do not want to have to fall back to
>> sending text queries just to get the semantics of a session. Thanks for
>> taking this on. It will be a nice addition to the Gremlin client/server
>> interaction model. The syntax you proposed looks good to me. I assume there
>> will also be an explicit rollback ?
>>
>> Cheers,
>> Kelvin
>>
>> Kelvin R. Lawrence
>> http://www.kelvinlawrence.net
>> https://www.linkedin.com/in/krlawrence/
>> Twitter: @gfxman
>>
>> From: Stephen Mallette
>> Sent: Thursday, March 18, 2021 7:16 PM
>> To: [email protected]
>> Subject: Re: [DSISCUSS] Remote Transactions
>>
>> I've created this issue for tracking:
>>
>> https://issues.apache.org/jira/browse/TINKERPOP-2537
>>
>> and have pushed a branch for early review:
>>
>> https://github.com/apache/tinkerpop/tree/TINKERPOP-2537
>>
>> The server-side is a bit of a copy/paste mess at the moment, so try not to
>> pay attention there too much and focus more on the driver and Transaction
>> interfaces side of things. At this point I'm amazed at how easily this
>> fell
>> together. It feels like I'm almost overlooking something important as I'd
>> not have guessed this to go in so straightforwardly. The hardest part was
>> figuring out the interaction among the various interfaces to get the user
>> API right but once that was untangled, stuff just started working.
>>
>> At this point, I think I need to pay some attention to the server-side
>> to make this pull request ready. I basically cut/paste a giant grip of
>> code
>> from TraversalOpProcessor into SessionOpProcessor and tweaked a few lines
>> here and there. Not terribly ideal, but it was the fastest way to be sure
>> that the general premise of the client-side was going to actually work.
>> Now
>> the question is how to make the server part nice, but refactoring there
>> isn't traditionally easy unfortunately.
>>
>>
>>
>>
>> On Thu, Mar 18, 2021 at 12:53 PM Stephen Mallette <[email protected]>
>> wrote:
>>
>> > First of all, I managed to fumble my last code sample - sorry. You can
>> > either use the Transaction object explicitly like this:
>> >
>> > g = traversal().withRemote(conn)
>> > tx = g.tx() // Transaction object
>> > gtx = tx.begin()  // GraphTraversalSource
>> > assert tx.isOpen() == true
>> > gtx.addV('person').iterate()
>> > gtx.addV('software').iterate()
>> > tx.commit() // alternatively you could explicitly rollback()
>> > assert tx.isOpen() == false
>> >
>> > Or you can look more like the old style:
>> >
>> > g = traversal().withRemote(conn)
>> > gtx = g.tx().begin()  // GraphTraversalSource
>> > assert gtx.tx().isOpen() == true
>> > gtx.addV('person').iterate()
>> > gtx.addV('software').iterate()
>> > gtx.tx().commit() // alternatively you could explicitly rollback()
>> > assert gtx.tx().isOpen() == false
>> >
>> > Both look fairly fluid I think and I'm especially pleased with the
>> second
>> > one. Need to do a bit more to prevent weird nested transactions and such
>> > but pretty neat for a just a day's worth of hacking about.
>> >
>> > Answers to Divij's questions inline below:
>> >
>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <[email protected]>
>> > wrote:
>> >
>> >> Great idea Stephen. I agree with standardizing the explicit transaction
>> >> semantics in cases of remote server (implicit aka sessionless is
>> already
>> >> self explanatory) and unifying the syntax with embedded graph usage.
>> >>
>> >> Couple of notes:
>> >>
>> >> 1. I would favor the begin() instead of create() as it's closer to
>> >> universally prominent SQL transaction syntax. This would reduce the
>> >> learning curve for users of TP.
>> >>
>> >
>> > I'm ok with begin().
>> >
>> >
>> >> 2. From an implementation standpoint, sessionless requests are queued
>> on
>> >> the server side and are serviced by the worker thread pool. But a
>> >> session-ful aka explicit transaction aka managed transaction starts a
>> new
>> >> single worked thread pool every time. In a scenario where we have both
>> >> session-less and sesion-ful requests being made to the server, the
>> >> resource
>> >> allocation will not be fair and may lead to starvation for some
>> requests.
>> >> This problem exists even today, hence not totally correlated to what
>> you
>> >> are proposing but I am afraid a wider adoption of explicit
>> >> transactions will bring this problem to the spotlight. Hence, we should
>> >> fix
>> >> this problem first. A solution would entail converging the worker pool
>> for
>> >> both session vs session-less requests.
>> >>
>> >> 3. You are proposing the idea of having a transaction scoped traversal
>> >> object. I agree with it but we need more clarification in behavior wrt
>> the
>> >> following scenarios:
>> >>
>> >> Q. What happens when g.tx().commit() is called on a transaction scoped
>> >> traversal object? Does the traversal get closed?
>> >>
>> >
>> > If you call gtx.tx().commit() then gtx is dead. If you try do another
>> > traversal with that it will throw an exception. If you call gtx.close()
>> it
>> > will close the session and then whatever default handling by the server
>> > will come into play to commit or rollback. if you call g.tx().commit(),
>> > meaning on the "parent" to gtx, then you basically send a commit
>> operation
>> > to the server and that's that.
>> >
>> > Q. Currently, the same traversal object could be used to execute
>> multiple
>> >> session-less requests simultaneously in a thread safe manner. Does the
>> >> same
>> >> behaviour apply to transaction scoped traversal?
>> >
>> >
>> > Yes. I would expect similar behavior there, though as you note below
>> they
>> > will just get processed in the server serially as they arrive and not in
>> > parallel.
>> >
>> >
>> >> If not, then how do I send
>> >> multiple requests in parallel from the client all scoped to the same
>> >> transaction on the server? Note that this is different from case of
>> multi
>> >> threaded transactions because on the server all requests are scoped to
>> >> single transaction i.e. single thread but on the client they may be
>> >> submitted via multiple threads.
>> >>
>> >> Regards,
>> >> Divij Vaidya
>> >>
>> >>
>> >>
>> >> On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <[email protected]
>> >
>> >> wrote:
>> >>
>> >> > The Transaction object is a bit important with remote cases because
>> it
>> >> > holds the reference to the session. With embedded use cases we
>> generally
>> >> > adhere to ThreadLocal transactions so the Transaction implementation
>> in
>> >> > that case is more of a controller for the current thread but for
>> remote
>> >> > cases the Transaction implementation holds a bit of state that can
>> cross
>> >> > over threads. In some ways, that makes remote cases feel like a
>> >> "threaded
>> >> > transaction" so that may be familiar to users?? Here's some example
>> >> > syntax I currently have working in a test case:
>> >> >
>> >> > g = traversal().withRemote(conn)
>> >> > gtx = g.tx().create()
>> >> > assert gtx.isOpen() == true
>> >> > gtx.addV('person').iterate()
>> >> > gtx.addV('software').iterate()
>> >> > gtx.commit() // alternatively you could explicitly rollback()
>> >> > assert gtx.isOpen() == false
>> >> >
>> >> > I hope that documentation changes are enough to unify the syntax and
>> >> remove
>> >> > confusion despite there still being ThreadLocal transactions as a
>> >> default
>> >> > for embedded cases and something else for remote. At least they will
>> >> look
>> >> > the same, even if you technically don't need to do a g.tx().create()
>> for
>> >> > embedded transactions and can just have the control you always had
>> with
>> >> > g.tx().commit() directly.
>> >> >
>> >> > Note that the remote Transaction object can support configuration via
>> >> > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the
>> default
>> >> of
>> >> > "commit on close" to rollback or manual (or i suppose something
>> custom).
>> >> > It's nice that this piece works. I don't see a point in supporting
>> >> > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in
>> remote
>> >> > contexts.
>> >> >
>> >> > Another point is that I've added what I termed a "Graph Operation"
>> >> bytecode
>> >> > instances which is bytecode that isn't related to traversal
>> >> construction. I
>> >> > hope this isn't one of those things where we end up wanting to
>> >> deprecate an
>> >> > idea as fast as we added it but we needed some way to pass
>> >> commit/rollback
>> >> > commands and they aren't really part of traversal construction. They
>> are
>> >> > operations that execute on the graph itself and I couldn't think of
>> how
>> >> to
>> >> > treat them as traversals nicely, so they sorta just exist as a one
>> off.
>> >> > Perhaps they will grow in number?? Folks have always asked if
>> bytecode
>> >> > requests could get "GraphFeature" information - I suppose this could
>> be
>> >> a
>> >> > way to do that??
>> >> >
>> >> > Anyway, I will keep going down this general path as it's appearing
>> >> > relatively fruitful.
>> >> >
>> >> >
>> >> >
>> >> >
>> >> > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <
>> [email protected]>
>> >> > wrote:
>> >> >
>> >> > > > We should just communicate clearly that a simple remote traversal
>> >> > > already uses a transaction by default,
>> >> > >
>> >> > > That's a good point because even with this change we still have a
>> >> > > situation where a single iterated remote traversal will generally
>> >> mean a
>> >> > > server managed commit/rollback, but in embedded mode, we have an
>> open
>> >> > > transaction (for transaction enabled graphs - TinkerGraph will
>> behave
>> >> > more
>> >> > > like a remote traversal, so more confusion there i guess). I'm not
>> >> sure
>> >> > how
>> >> > > to rectify that at this time except by way of documentation.
>> >> > >
>> >> > > The only thing I can think of is that the default for embedded
>> would
>> >> have
>> >> > > to be auto-commit per traversal. In that way it would work like
>> remote
>> >> > > traversals and for graphs that don't support transactions like
>> >> > TinkerGraph.
>> >> > > Of course, that's the kind of change that will break a lot of code.
>> >> Maybe
>> >> > > we just keep that idea for another version.
>> >> > >
>> >> > > On Thu, Mar 18, 2021 at 4:21 AM <[email protected]> wrote:
>> >> > >
>> >> > >> I like the idea of adding transactions for remote traversals as
>> they
>> >> > >> close a gap in functionality that we currently have for remote
>> >> > traversals.
>> >> > >> We should just communicate clearly that a simple remote traversal
>> >> > already
>> >> > >> uses a transaction by default, meaning that the server will roll
>> back
>> >> > any
>> >> > >> changes if any exception occurs. Users often ask about how to do
>> >> > >> transactions with a remote traversal because they don't know about
>> >> this
>> >> > and
>> >> > >> I'm afraid that we might add even more confusion if we now add
>> >> > transactions
>> >> > >> for remote traversals. Hence why I think that we should
>> communicate
>> >> this
>> >> > >> clearly when we add remote transactions.
>> >> > >>
>> >> > >> Getting this to work in the GLVs should be possible but will
>> require
>> >> > some
>> >> > >> effort. I think we would have to introduce some
>> >> > >> "DriverRemoteTransactionTraversal" that doesn't submit the
>> traversal
>> >> on
>> >> > a
>> >> > >> terminal step but saves it and then submits all saved traversals
>> >> > together
>> >> > >> on close().
>> >> > >> This also means that we should directly add an async version of
>> >> close(),
>> >> > >> maybe closeAsync()? (Same for commit() and rollback())
>> >> > >>
>> >> > >> I also like create() better than traversal() because it would
>> >> confuse me
>> >> > >> to first start a traversal with traversal() and then also start a
>> >> > >> transaction with the same method.
>> >> > >>
>> >> > >> -----Ursprüngliche Nachricht-----
>> >> > >> Von: Stephen Mallette <[email protected]>
>> >> > >> Gesendet: Donnerstag, 18. März 2021 00:01
>> >> > >> An: [email protected]
>> >> > >> Betreff: Re: [DSISCUSS] Remote Transactions
>> >> > >>
>> >> > >> One thing I should have noted about tx().create(). The create()
>> very
>> >> > well
>> >> > >> could have been named traversal(), thus the previous example
>> reading
>> >> as:
>> >> > >>
>> >> > >> g = traversal().withEmbedded(graph)
>> >> > >> // or
>> >> > >> g = traversal().withRemote(conn)
>> >> > >> gtx = g.tx().traversal()
>> >> > >> gtx.addV('person').iterate()
>> >> > >> gtx.addV('software').iterate()
>> >> > >> gtx.close() // alternatively you could explicitly commit() or
>> >> rollback()
>> >> > >>
>> >> > >> You're basically spawning a new GraphTraversalSource from the
>> >> > Transaction
>> >> > >> object rather than from a Graph or AnonymousTraversal. I chose
>> >> create()
>> >> > >> because it felt like this looked weird:
>> >> > >>
>> >> > >> g = traversal().withRemote(conn).tx().traversal()
>> >> > >>
>> >> > >> which would be a weird thing to do I guess, but just seeing
>> >> > "traversal()"
>> >> > >> over and over seemed odd looking and I wanted to differentiate
>> with
>> >> the
>> >> > >> Transaction object even though the methods are identical in what
>> they
>> >> > do. I
>> >> > >> suppose I also drew inspiration from:
>> >> > >>
>> >> > >> Transaction.createdThreadedTx()
>> >> > >>
>> >> > >> which I think we might consider deprecating. JanusGraph would
>> simply
>> >> > >> expose their own Transaction object in the future that has that
>> >> method
>> >> > as I
>> >> > >> imagine folks still use that feature. As far as I know, no other
>> >> graph
>> >> > >> implements that functionality and my guess is that no one will
>> likely
>> >> > do so
>> >> > >> in the future.
>> >> > >>
>> >> > >> anyway, if you like traversal() better than create() or the other
>> way
>> >> > >> around, please let me know
>> >> > >>
>> >> > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <
>> >> [email protected]>
>> >> > >> wrote:
>> >> > >>
>> >> > >> > I am in favor of this change as I any idea that unifies the
>> >> multiple
>> >> > >> > different ways things work in TP will only make it easier to
>> learn
>> >> and
>> >> > >> > adopt.
>> >> > >> >
>> >> > >> > I don't know that I have enough knowledge of the inner workings
>> of
>> >> > >> > transactions to know what/if this will cause problems.
>> >> > >> >
>> >> > >> > Dave
>> >> > >> >
>> >> > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette
>> >> > >> > <[email protected]>
>> >> > >> > wrote:
>> >> > >> >
>> >> > >> > > We haven't touched "transactions" since TP3 was originally
>> >> designed.
>> >> > >> > > They have remained a feature for embedded use cases even in
>> the
>> >> face
>> >> > >> > > of the
>> >> > >> > rise
>> >> > >> > > of remote graph use cases and result in major inconsistencies
>> >> that
>> >> > >> > > really bother users.
>> >> > >> > >
>> >> > >> > > As we close on 3.5.0, I figured that perhaps there was a
>> chance
>> >> to
>> >> > >> > > do something radical about transactions. Basically, I'd like
>> >> > >> > > transactions to work irrespective of remote or embedded usage
>> and
>> >> > >> > > for them to have the
>> >> > >> > same
>> >> > >> > > API when doing so. In mulling it over for the last day or so I
>> >> had a
>> >> > >> > > realization that makes me believe that the following is
>> possible:
>> >> > >> > >
>> >> > >> > > g = traversal().withEmbedded(graph)
>> >> > >> > > // or
>> >> > >> > > g = traversal().withRemote(conn)
>> >> > >> > > gtx = g.tx().create()
>> >> > >> > > gtx.addV('person').iterate()
>> >> > >> > > gtx.addV('software').iterate()
>> >> > >> > > gtx.close() // alternatively you could explicitly commit() or
>> >> > >> > > rollback()
>> >> > >> > >
>> >> > >> > > // you could still use g for sessionless, but gtx is "done"
>> >> after //
>> >> > >> > > you close so you will need to create() a new gtx instance for
>> a
>> >> //
>> >> > >> > > fresh transaction assert 2 == g.V().count().next()
>> >> > >> > >
>> >> > >> > > Note that the create() method on tx() is the new bit of
>> necessary
>> >> > >> syntax.
>> >> > >> > > Technically, it is a do-nothing for embedded mode (and you
>> could
>> >> > >> > > skip it for thread-local auto-transactions) but all the
>> >> > >> > > documentation can be shifted so that the API is identical to
>> >> remote.
>> >> > >> > > The change would be non-breaking as the embedded transaction
>> >> > >> > > approach would remain as it is, but would no longer be
>> >> documented as
>> >> > >> > > the preferred approach. Perhaps we could one day disallow it
>> but
>> >> > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we
>> want
>> >> to
>> >> > >> touch in TP3.
>> >> > >> > >
>> >> > >> > > What happens behind the scenes of g.tx().create() is that in
>> >> remote
>> >> > >> > > cases the RemoteConnection constructs a remote Transaction
>> >> > >> > > implementation which basically is used to spawn new traversal
>> >> source
>> >> > >> > > instances with a session based connections. The remote
>> >> Transaction
>> >> > >> > > object can't support
>> >> > >> > transaction
>> >> > >> > > listeners or special read-write/close configurations, but I
>> don't
>> >> > >> > > think that's a problem as those things don't make a lot of
>> sense
>> >> in
>> >> > >> > > remote use cases.
>> >> > >> > >
>> >> > >> > > From the server perspective, this change would also mean that
>> >> > >> > > sessions would have to accept bytecode and not just scripts,
>> >> which
>> >> > >> > > technically shouldn't be a problem.
>> >> > >> > >
>> >> > >> > > One downside at the moment is that i'm thinking mostly in
>> Java at
>> >> > >> > > the moment. I'm not sure how this all works in other variants
>> >> just
>> >> > >> > > yet, but
>> >> > >> > I'd
>> >> > >> > > hope to keep similar syntax.
>> >> > >> > >
>> >> > >> > > I will keep experimenting tomorrow. If you have any thoughts
>> on
>> >> the
>> >> > >> > matter,
>> >> > >> > > I'd be happy to hear them. Thanks!
>> >> > >> > >
>> >> > >> >
>> >> > >>
>> >> > >>
>> >> >
>> >>
>> >
>>
>>

Re: [DSISCUSS] Remote Transactions

Reply via email to