I started a fresh thread for this topic Divij brought up, with more context:
> In a scenario where we have both > session-less and sesion-ful requests being made to the server, the resource > allocation will not be fair and may lead to starvation for some requests. > This problem exists even today, hence not totally correlated to what you > are proposing but I am afraid a wider adoption of explicit > transactions will bring this problem to the spotlight. Hence, we should fix > this problem first. A solution would entail converging the worker pool for > both session vs session-less requests. I'm not sure we can get that done in 3.5.0, but maybe? I suppose the question is how would we ensure that each request for a session ends up being executed on the same thread from the previous request if jobs are randomly submitted to a worker pool? On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya <[email protected]> wrote: > Great idea Stephen. I agree with standardizing the explicit transaction > semantics in cases of remote server (implicit aka sessionless is already > self explanatory) and unifying the syntax with embedded graph usage. > > Couple of notes: > > 1. I would favor the begin() instead of create() as it's closer to > universally prominent SQL transaction syntax. This would reduce the > learning curve for users of TP. > > 2. From an implementation standpoint, sessionless requests are queued on > the server side and are serviced by the worker thread pool. But a > session-ful aka explicit transaction aka managed transaction starts a new > single worked thread pool every time. In a scenario where we have both > session-less and sesion-ful requests being made to the server, the resource > allocation will not be fair and may lead to starvation for some requests. > This problem exists even today, hence not totally correlated to what you > are proposing but I am afraid a wider adoption of explicit > transactions will bring this problem to the spotlight. Hence, we should fix > this problem first. A solution would entail converging the worker pool for > both session vs session-less requests. > > 3. You are proposing the idea of having a transaction scoped traversal > object. I agree with it but we need more clarification in behavior wrt the > following scenarios: > > Q. What happens when g.tx().commit() is called on a transaction scoped > traversal object? Does the traversal get closed? > Q. Currently, the same traversal object could be used to execute multiple > session-less requests simultaneously in a thread safe manner. Does the same > behaviour apply to transaction scoped traversal? If not, then how do I send > multiple requests in parallel from the client all scoped to the same > transaction on the server? Note that this is different from case of multi > threaded transactions because on the server all requests are scoped to > single transaction i.e. single thread but on the client they may be > submitted via multiple threads. > > Regards, > Divij Vaidya > > > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette <[email protected]> > wrote: > > > The Transaction object is a bit important with remote cases because it > > holds the reference to the session. With embedded use cases we generally > > adhere to ThreadLocal transactions so the Transaction implementation in > > that case is more of a controller for the current thread but for remote > > cases the Transaction implementation holds a bit of state that can cross > > over threads. In some ways, that makes remote cases feel like a "threaded > > transaction" so that may be familiar to users?? Here's some example > > syntax I currently have working in a test case: > > > > g = traversal().withRemote(conn) > > gtx = g.tx().create() > > assert gtx.isOpen() == true > > gtx.addV('person').iterate() > > gtx.addV('software').iterate() > > gtx.commit() // alternatively you could explicitly rollback() > > assert gtx.isOpen() == false > > > > I hope that documentation changes are enough to unify the syntax and > remove > > confusion despite there still being ThreadLocal transactions as a default > > for embedded cases and something else for remote. At least they will look > > the same, even if you technically don't need to do a g.tx().create() for > > embedded transactions and can just have the control you always had with > > g.tx().commit() directly. > > > > Note that the remote Transaction object can support configuration via > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the default > of > > "commit on close" to rollback or manual (or i suppose something custom). > > It's nice that this piece works. I don't see a point in supporting > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in remote > > contexts. > > > > Another point is that I've added what I termed a "Graph Operation" > bytecode > > instances which is bytecode that isn't related to traversal > construction. I > > hope this isn't one of those things where we end up wanting to deprecate > an > > idea as fast as we added it but we needed some way to pass > commit/rollback > > commands and they aren't really part of traversal construction. They are > > operations that execute on the graph itself and I couldn't think of how > to > > treat them as traversals nicely, so they sorta just exist as a one off. > > Perhaps they will grow in number?? Folks have always asked if bytecode > > requests could get "GraphFeature" information - I suppose this could be a > > way to do that?? > > > > Anyway, I will keep going down this general path as it's appearing > > relatively fruitful. > > > > > > > > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette <[email protected]> > > wrote: > > > > > > We should just communicate clearly that a simple remote traversal > > > already uses a transaction by default, > > > > > > That's a good point because even with this change we still have a > > > situation where a single iterated remote traversal will generally mean > a > > > server managed commit/rollback, but in embedded mode, we have an open > > > transaction (for transaction enabled graphs - TinkerGraph will behave > > more > > > like a remote traversal, so more confusion there i guess). I'm not sure > > how > > > to rectify that at this time except by way of documentation. > > > > > > The only thing I can think of is that the default for embedded would > have > > > to be auto-commit per traversal. In that way it would work like remote > > > traversals and for graphs that don't support transactions like > > TinkerGraph. > > > Of course, that's the kind of change that will break a lot of code. > Maybe > > > we just keep that idea for another version. > > > > > > On Thu, Mar 18, 2021 at 4:21 AM <[email protected]> wrote: > > > > > >> I like the idea of adding transactions for remote traversals as they > > >> close a gap in functionality that we currently have for remote > > traversals. > > >> We should just communicate clearly that a simple remote traversal > > already > > >> uses a transaction by default, meaning that the server will roll back > > any > > >> changes if any exception occurs. Users often ask about how to do > > >> transactions with a remote traversal because they don't know about > this > > and > > >> I'm afraid that we might add even more confusion if we now add > > transactions > > >> for remote traversals. Hence why I think that we should communicate > this > > >> clearly when we add remote transactions. > > >> > > >> Getting this to work in the GLVs should be possible but will require > > some > > >> effort. I think we would have to introduce some > > >> "DriverRemoteTransactionTraversal" that doesn't submit the traversal > on > > a > > >> terminal step but saves it and then submits all saved traversals > > together > > >> on close(). > > >> This also means that we should directly add an async version of > close(), > > >> maybe closeAsync()? (Same for commit() and rollback()) > > >> > > >> I also like create() better than traversal() because it would confuse > me > > >> to first start a traversal with traversal() and then also start a > > >> transaction with the same method. > > >> > > >> -----Ursprüngliche Nachricht----- > > >> Von: Stephen Mallette <[email protected]> > > >> Gesendet: Donnerstag, 18. März 2021 00:01 > > >> An: [email protected] > > >> Betreff: Re: [DSISCUSS] Remote Transactions > > >> > > >> One thing I should have noted about tx().create(). The create() very > > well > > >> could have been named traversal(), thus the previous example reading > as: > > >> > > >> g = traversal().withEmbedded(graph) > > >> // or > > >> g = traversal().withRemote(conn) > > >> gtx = g.tx().traversal() > > >> gtx.addV('person').iterate() > > >> gtx.addV('software').iterate() > > >> gtx.close() // alternatively you could explicitly commit() or > rollback() > > >> > > >> You're basically spawning a new GraphTraversalSource from the > > Transaction > > >> object rather than from a Graph or AnonymousTraversal. I chose > create() > > >> because it felt like this looked weird: > > >> > > >> g = traversal().withRemote(conn).tx().traversal() > > >> > > >> which would be a weird thing to do I guess, but just seeing > > "traversal()" > > >> over and over seemed odd looking and I wanted to differentiate with > the > > >> Transaction object even though the methods are identical in what they > > do. I > > >> suppose I also drew inspiration from: > > >> > > >> Transaction.createdThreadedTx() > > >> > > >> which I think we might consider deprecating. JanusGraph would simply > > >> expose their own Transaction object in the future that has that method > > as I > > >> imagine folks still use that feature. As far as I know, no other graph > > >> implements that functionality and my guess is that no one will likely > > do so > > >> in the future. > > >> > > >> anyway, if you like traversal() better than create() or the other way > > >> around, please let me know > > >> > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger <[email protected] > > > > >> wrote: > > >> > > >> > I am in favor of this change as I any idea that unifies the multiple > > >> > different ways things work in TP will only make it easier to learn > and > > >> > adopt. > > >> > > > >> > I don't know that I have enough knowledge of the inner workings of > > >> > transactions to know what/if this will cause problems. > > >> > > > >> > Dave > > >> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette > > >> > <[email protected]> > > >> > wrote: > > >> > > > >> > > We haven't touched "transactions" since TP3 was originally > designed. > > >> > > They have remained a feature for embedded use cases even in the > face > > >> > > of the > > >> > rise > > >> > > of remote graph use cases and result in major inconsistencies that > > >> > > really bother users. > > >> > > > > >> > > As we close on 3.5.0, I figured that perhaps there was a chance to > > >> > > do something radical about transactions. Basically, I'd like > > >> > > transactions to work irrespective of remote or embedded usage and > > >> > > for them to have the > > >> > same > > >> > > API when doing so. In mulling it over for the last day or so I > had a > > >> > > realization that makes me believe that the following is possible: > > >> > > > > >> > > g = traversal().withEmbedded(graph) > > >> > > // or > > >> > > g = traversal().withRemote(conn) > > >> > > gtx = g.tx().create() > > >> > > gtx.addV('person').iterate() > > >> > > gtx.addV('software').iterate() > > >> > > gtx.close() // alternatively you could explicitly commit() or > > >> > > rollback() > > >> > > > > >> > > // you could still use g for sessionless, but gtx is "done" after > // > > >> > > you close so you will need to create() a new gtx instance for a // > > >> > > fresh transaction assert 2 == g.V().count().next() > > >> > > > > >> > > Note that the create() method on tx() is the new bit of necessary > > >> syntax. > > >> > > Technically, it is a do-nothing for embedded mode (and you could > > >> > > skip it for thread-local auto-transactions) but all the > > >> > > documentation can be shifted so that the API is identical to > remote. > > >> > > The change would be non-breaking as the embedded transaction > > >> > > approach would remain as it is, but would no longer be documented > as > > >> > > the preferred approach. Perhaps we could one day disallow it but > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we want to > > >> touch in TP3. > > >> > > > > >> > > What happens behind the scenes of g.tx().create() is that in > remote > > >> > > cases the RemoteConnection constructs a remote Transaction > > >> > > implementation which basically is used to spawn new traversal > source > > >> > > instances with a session based connections. The remote Transaction > > >> > > object can't support > > >> > transaction > > >> > > listeners or special read-write/close configurations, but I don't > > >> > > think that's a problem as those things don't make a lot of sense > in > > >> > > remote use cases. > > >> > > > > >> > > From the server perspective, this change would also mean that > > >> > > sessions would have to accept bytecode and not just scripts, which > > >> > > technically shouldn't be a problem. > > >> > > > > >> > > One downside at the moment is that i'm thinking mostly in Java at > > >> > > the moment. I'm not sure how this all works in other variants just > > >> > > yet, but > > >> > I'd > > >> > > hope to keep similar syntax. > > >> > > > > >> > > I will keep experimenting tomorrow. If you have any thoughts on > the > > >> > matter, > > >> > > I'd be happy to hear them. Thanks! > > >> > > > > >> > > > >> > > >> > > >
