I'd left this work temporarily to focus on the UnifiedChannelizer. Ultimately, that body of work seems to satisfy the worries I had about the copy/paste approach I took to get things working here with the old OpProcessor stuff. Given that the UnifiedChannelizer will eventually replace OpProcessor, I think we can live with the copy/paste approach until the UnifiedChannelizer becomes the default, which I hope we can see firm up over 3.5.x. If that's ok with everyone I think I will focus on firming up documentation and testing for remote g.tx() and get a PR issued.
On Mon, Mar 22, 2021 at 6:14 AM Stephen Mallette <[email protected]> wrote: > > I assume there will also be an explicit rollback ? > > yes, you could call rollback() directly (I think that's what you're > asking): > > g = traversal().withRemote(conn) > gtx = g.tx().create() > assert gtx.isOpen() == true > gtx.addV('person').iterate() > gtx.addV('software').iterate() > gtx.rollback() // alternatively you could explicitly commit() > assert gtx.isOpen() == false > > > > > On Sat, Mar 20, 2021 at 10:34 AM Kelvin Lawrence <[email protected]> > wrote: > >> Just a quick comment to add to what others have said. This functionality >> has been a common ask by users who do not want to have to fall back to >> sending text queries just to get the semantics of a session. Thanks for >> taking this on. It will be a nice addition to the Gremlin client/server >> interaction model. The syntax you proposed looks good to me. I assume there >> will also be an explicit rollback ? >> >> Cheers, >> Kelvin >> >> Kelvin R. Lawrence >> http://www.kelvinlawrence.net >> https://www.linkedin.com/in/krlawrence/ >> Twitter: @gfxman >> >> From: Stephen Mallette >> Sent: Thursday, March 18, 2021 7:16 PM >> To: [email protected] >> Subject: Re: [DSISCUSS] Remote Transactions >> >> I've created this issue for tracking: >> >> https://issues.apache.org/jira/browse/TINKERPOP-2537 >> >> and have pushed a branch for early review: >> >> https://github.com/apache/tinkerpop/tree/TINKERPOP-2537 >> >> The server-side is a bit of a copy/paste mess at the moment, so try not to >> pay attention there too much and focus more on the driver and Transaction >> interfaces side of things. At this point I'm amazed at how easily this >> fell >> together. It feels like I'm almost overlooking something important as I'd >> not have guessed this to go in so straightforwardly. The hardest part was >> figuring out the interaction among the various interfaces to get the user >> API right but once that was untangled, stuff just started working. >> >> At this point, I think I need to pay some attention to the server-side >> to make this pull request ready. I basically cut/paste a giant grip of >> code >> from TraversalOpProcessor into SessionOpProcessor and tweaked a few lines >> here and there. Not terribly ideal, but it was the fastest way to be sure >> that the general premise of the client-side was going to actually work. >> Now >> the question is how to make the server part nice, but refactoring there >> isn't traditionally easy unfortunately. >> >> >> >> >> On Thu, Mar 18, 2021 at 12:53 PM Stephen Mallette <[email protected]> >> wrote: >> >> > First of all, I managed to fumble my last code sample - sorry. You can >> > either use the Transaction object explicitly like this: >> > >> > g = traversal().withRemote(conn) >> > tx = g.tx() // Transaction object >> > gtx = tx.begin() // GraphTraversalSource >> > assert tx.isOpen() == true >> > gtx.addV('person').iterate() >> > gtx.addV('software').iterate() >> > tx.commit() // alternatively you could explicitly rollback() >> > assert tx.isOpen() == false >> > >> > Or you can look more like the old style: >> > >> > g = traversal().withRemote(conn) >> > gtx = g.tx().begin() // GraphTraversalSource >> > assert gtx.tx().isOpen() == true >> > gtx.addV('person').iterate() >> > gtx.addV('software').iterate() >> > gtx.tx().commit() // alternatively you could explicitly rollback() >> > assert gtx.tx().isOpen() == false >> > >> > Both look fairly fluid I think and I'm especially pleased with the >> second >> > one. Need to do a bit more to prevent weird nested transactions and such >> > but pretty neat for a just a day's worth of hacking about. >> > >> > Answers to Divij's questions inline below: >> > >> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <[email protected]> >> > wrote: >> > >> >> Great idea Stephen. I agree with standardizing the explicit transaction >> >> semantics in cases of remote server (implicit aka sessionless is >> already >> >> self explanatory) and unifying the syntax with embedded graph usage. >> >> >> >> Couple of notes: >> >> >> >> 1. I would favor the begin() instead of create() as it's closer to >> >> universally prominent SQL transaction syntax. This would reduce the >> >> learning curve for users of TP. >> >> >> > >> > I'm ok with begin(). >> > >> > >> >> 2. From an implementation standpoint, sessionless requests are queued >> on >> >> the server side and are serviced by the worker thread pool. But a >> >> session-ful aka explicit transaction aka managed transaction starts a >> new >> >> single worked thread pool every time. In a scenario where we have both >> >> session-less and sesion-ful requests being made to the server, the >> >> resource >> >> allocation will not be fair and may lead to starvation for some >> requests. >> >> This problem exists even today, hence not totally correlated to what >> you >> >> are proposing but I am afraid a wider adoption of explicit >> >> transactions will bring this problem to the spotlight. Hence, we should >> >> fix >> >> this problem first. A solution would entail converging the worker pool >> for >> >> both session vs session-less requests. >> >> >> >> 3. You are proposing the idea of having a transaction scoped traversal >> >> object. I agree with it but we need more clarification in behavior wrt >> the >> >> following scenarios: >> >> >> >> Q. What happens when g.tx().commit() is called on a transaction scoped >> >> traversal object? Does the traversal get closed? >> >> >> > >> > If you call gtx.tx().commit() then gtx is dead. If you try do another >> > traversal with that it will throw an exception. If you call gtx.close() >> it >> > will close the session and then whatever default handling by the server >> > will come into play to commit or rollback. if you call g.tx().commit(), >> > meaning on the "parent" to gtx, then you basically send a commit >> operation >> > to the server and that's that. >> > >> > Q. Currently, the same traversal object could be used to execute >> multiple >> >> session-less requests simultaneously in a thread safe manner. Does the >> >> same >> >> behaviour apply to transaction scoped traversal? >> > >> > >> > Yes. I would expect similar behavior there, though as you note below >> they >> > will just get processed in the server serially as they arrive and not in >> > parallel. >> > >> > >> >> If not, then how do I send >> >> multiple requests in parallel from the client all scoped to the same >> >> transaction on the server? Note that this is different from case of >> multi >> >> threaded transactions because on the server all requests are scoped to >> >> single transaction i.e. single thread but on the client they may be >> >> submitted via multiple threads. >> >> >> >> Regards, >> >> Divij Vaidya >> >> >> >> >> >> >> >> On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <[email protected] >> > >> >> wrote: >> >> >> >> > The Transaction object is a bit important with remote cases because >> it >> >> > holds the reference to the session. With embedded use cases we >> generally >> >> > adhere to ThreadLocal transactions so the Transaction implementation >> in >> >> > that case is more of a controller for the current thread but for >> remote >> >> > cases the Transaction implementation holds a bit of state that can >> cross >> >> > over threads. In some ways, that makes remote cases feel like a >> >> "threaded >> >> > transaction" so that may be familiar to users?? Here's some example >> >> > syntax I currently have working in a test case: >> >> > >> >> > g = traversal().withRemote(conn) >> >> > gtx = g.tx().create() >> >> > assert gtx.isOpen() == true >> >> > gtx.addV('person').iterate() >> >> > gtx.addV('software').iterate() >> >> > gtx.commit() // alternatively you could explicitly rollback() >> >> > assert gtx.isOpen() == false >> >> > >> >> > I hope that documentation changes are enough to unify the syntax and >> >> remove >> >> > confusion despite there still being ThreadLocal transactions as a >> >> default >> >> > for embedded cases and something else for remote. At least they will >> >> look >> >> > the same, even if you technically don't need to do a g.tx().create() >> for >> >> > embedded transactions and can just have the control you always had >> with >> >> > g.tx().commit() directly. >> >> > >> >> > Note that the remote Transaction object can support configuration via >> >> > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the >> default >> >> of >> >> > "commit on close" to rollback or manual (or i suppose something >> custom). >> >> > It's nice that this piece works. I don't see a point in supporting >> >> > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in >> remote >> >> > contexts. >> >> > >> >> > Another point is that I've added what I termed a "Graph Operation" >> >> bytecode >> >> > instances which is bytecode that isn't related to traversal >> >> construction. I >> >> > hope this isn't one of those things where we end up wanting to >> >> deprecate an >> >> > idea as fast as we added it but we needed some way to pass >> >> commit/rollback >> >> > commands and they aren't really part of traversal construction. They >> are >> >> > operations that execute on the graph itself and I couldn't think of >> how >> >> to >> >> > treat them as traversals nicely, so they sorta just exist as a one >> off. >> >> > Perhaps they will grow in number?? Folks have always asked if >> bytecode >> >> > requests could get "GraphFeature" information - I suppose this could >> be >> >> a >> >> > way to do that?? >> >> > >> >> > Anyway, I will keep going down this general path as it's appearing >> >> > relatively fruitful. >> >> > >> >> > >> >> > >> >> > >> >> > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette < >> [email protected]> >> >> > wrote: >> >> > >> >> > > > We should just communicate clearly that a simple remote traversal >> >> > > already uses a transaction by default, >> >> > > >> >> > > That's a good point because even with this change we still have a >> >> > > situation where a single iterated remote traversal will generally >> >> mean a >> >> > > server managed commit/rollback, but in embedded mode, we have an >> open >> >> > > transaction (for transaction enabled graphs - TinkerGraph will >> behave >> >> > more >> >> > > like a remote traversal, so more confusion there i guess). I'm not >> >> sure >> >> > how >> >> > > to rectify that at this time except by way of documentation. >> >> > > >> >> > > The only thing I can think of is that the default for embedded >> would >> >> have >> >> > > to be auto-commit per traversal. In that way it would work like >> remote >> >> > > traversals and for graphs that don't support transactions like >> >> > TinkerGraph. >> >> > > Of course, that's the kind of change that will break a lot of code. >> >> Maybe >> >> > > we just keep that idea for another version. >> >> > > >> >> > > On Thu, Mar 18, 2021 at 4:21 AM <[email protected]> wrote: >> >> > > >> >> > >> I like the idea of adding transactions for remote traversals as >> they >> >> > >> close a gap in functionality that we currently have for remote >> >> > traversals. >> >> > >> We should just communicate clearly that a simple remote traversal >> >> > already >> >> > >> uses a transaction by default, meaning that the server will roll >> back >> >> > any >> >> > >> changes if any exception occurs. Users often ask about how to do >> >> > >> transactions with a remote traversal because they don't know about >> >> this >> >> > and >> >> > >> I'm afraid that we might add even more confusion if we now add >> >> > transactions >> >> > >> for remote traversals. Hence why I think that we should >> communicate >> >> this >> >> > >> clearly when we add remote transactions. >> >> > >> >> >> > >> Getting this to work in the GLVs should be possible but will >> require >> >> > some >> >> > >> effort. I think we would have to introduce some >> >> > >> "DriverRemoteTransactionTraversal" that doesn't submit the >> traversal >> >> on >> >> > a >> >> > >> terminal step but saves it and then submits all saved traversals >> >> > together >> >> > >> on close(). >> >> > >> This also means that we should directly add an async version of >> >> close(), >> >> > >> maybe closeAsync()? (Same for commit() and rollback()) >> >> > >> >> >> > >> I also like create() better than traversal() because it would >> >> confuse me >> >> > >> to first start a traversal with traversal() and then also start a >> >> > >> transaction with the same method. >> >> > >> >> >> > >> -----Ursprüngliche Nachricht----- >> >> > >> Von: Stephen Mallette <[email protected]> >> >> > >> Gesendet: Donnerstag, 18. März 2021 00:01 >> >> > >> An: [email protected] >> >> > >> Betreff: Re: [DSISCUSS] Remote Transactions >> >> > >> >> >> > >> One thing I should have noted about tx().create(). The create() >> very >> >> > well >> >> > >> could have been named traversal(), thus the previous example >> reading >> >> as: >> >> > >> >> >> > >> g = traversal().withEmbedded(graph) >> >> > >> // or >> >> > >> g = traversal().withRemote(conn) >> >> > >> gtx = g.tx().traversal() >> >> > >> gtx.addV('person').iterate() >> >> > >> gtx.addV('software').iterate() >> >> > >> gtx.close() // alternatively you could explicitly commit() or >> >> rollback() >> >> > >> >> >> > >> You're basically spawning a new GraphTraversalSource from the >> >> > Transaction >> >> > >> object rather than from a Graph or AnonymousTraversal. I chose >> >> create() >> >> > >> because it felt like this looked weird: >> >> > >> >> >> > >> g = traversal().withRemote(conn).tx().traversal() >> >> > >> >> >> > >> which would be a weird thing to do I guess, but just seeing >> >> > "traversal()" >> >> > >> over and over seemed odd looking and I wanted to differentiate >> with >> >> the >> >> > >> Transaction object even though the methods are identical in what >> they >> >> > do. I >> >> > >> suppose I also drew inspiration from: >> >> > >> >> >> > >> Transaction.createdThreadedTx() >> >> > >> >> >> > >> which I think we might consider deprecating. JanusGraph would >> simply >> >> > >> expose their own Transaction object in the future that has that >> >> method >> >> > as I >> >> > >> imagine folks still use that feature. As far as I know, no other >> >> graph >> >> > >> implements that functionality and my guess is that no one will >> likely >> >> > do so >> >> > >> in the future. >> >> > >> >> >> > >> anyway, if you like traversal() better than create() or the other >> way >> >> > >> around, please let me know >> >> > >> >> >> > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger < >> >> [email protected]> >> >> > >> wrote: >> >> > >> >> >> > >> > I am in favor of this change as I any idea that unifies the >> >> multiple >> >> > >> > different ways things work in TP will only make it easier to >> learn >> >> and >> >> > >> > adopt. >> >> > >> > >> >> > >> > I don't know that I have enough knowledge of the inner workings >> of >> >> > >> > transactions to know what/if this will cause problems. >> >> > >> > >> >> > >> > Dave >> >> > >> > >> >> > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette >> >> > >> > <[email protected]> >> >> > >> > wrote: >> >> > >> > >> >> > >> > > We haven't touched "transactions" since TP3 was originally >> >> designed. >> >> > >> > > They have remained a feature for embedded use cases even in >> the >> >> face >> >> > >> > > of the >> >> > >> > rise >> >> > >> > > of remote graph use cases and result in major inconsistencies >> >> that >> >> > >> > > really bother users. >> >> > >> > > >> >> > >> > > As we close on 3.5.0, I figured that perhaps there was a >> chance >> >> to >> >> > >> > > do something radical about transactions. Basically, I'd like >> >> > >> > > transactions to work irrespective of remote or embedded usage >> and >> >> > >> > > for them to have the >> >> > >> > same >> >> > >> > > API when doing so. In mulling it over for the last day or so I >> >> had a >> >> > >> > > realization that makes me believe that the following is >> possible: >> >> > >> > > >> >> > >> > > g = traversal().withEmbedded(graph) >> >> > >> > > // or >> >> > >> > > g = traversal().withRemote(conn) >> >> > >> > > gtx = g.tx().create() >> >> > >> > > gtx.addV('person').iterate() >> >> > >> > > gtx.addV('software').iterate() >> >> > >> > > gtx.close() // alternatively you could explicitly commit() or >> >> > >> > > rollback() >> >> > >> > > >> >> > >> > > // you could still use g for sessionless, but gtx is "done" >> >> after // >> >> > >> > > you close so you will need to create() a new gtx instance for >> a >> >> // >> >> > >> > > fresh transaction assert 2 == g.V().count().next() >> >> > >> > > >> >> > >> > > Note that the create() method on tx() is the new bit of >> necessary >> >> > >> syntax. >> >> > >> > > Technically, it is a do-nothing for embedded mode (and you >> could >> >> > >> > > skip it for thread-local auto-transactions) but all the >> >> > >> > > documentation can be shifted so that the API is identical to >> >> remote. >> >> > >> > > The change would be non-breaking as the embedded transaction >> >> > >> > > approach would remain as it is, but would no longer be >> >> documented as >> >> > >> > > the preferred approach. Perhaps we could one day disallow it >> but >> >> > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we >> want >> >> to >> >> > >> touch in TP3. >> >> > >> > > >> >> > >> > > What happens behind the scenes of g.tx().create() is that in >> >> remote >> >> > >> > > cases the RemoteConnection constructs a remote Transaction >> >> > >> > > implementation which basically is used to spawn new traversal >> >> source >> >> > >> > > instances with a session based connections. The remote >> >> Transaction >> >> > >> > > object can't support >> >> > >> > transaction >> >> > >> > > listeners or special read-write/close configurations, but I >> don't >> >> > >> > > think that's a problem as those things don't make a lot of >> sense >> >> in >> >> > >> > > remote use cases. >> >> > >> > > >> >> > >> > > From the server perspective, this change would also mean that >> >> > >> > > sessions would have to accept bytecode and not just scripts, >> >> which >> >> > >> > > technically shouldn't be a problem. >> >> > >> > > >> >> > >> > > One downside at the moment is that i'm thinking mostly in >> Java at >> >> > >> > > the moment. I'm not sure how this all works in other variants >> >> just >> >> > >> > > yet, but >> >> > >> > I'd >> >> > >> > > hope to keep similar syntax. >> >> > >> > > >> >> > >> > > I will keep experimenting tomorrow. If you have any thoughts >> on >> >> the >> >> > >> > matter, >> >> > >> > > I'd be happy to hear them. Thanks! >> >> > >> > > >> >> > >> > >> >> > >> >> >> > >> >> >> > >> >> >> > >> >>
