I can't seem to get Travis to pass the build for this branch with the UnifiedChannelizer enabled.
https://travis-ci.com/github/apache/tinkerpop/builds/221427334 It just hangs without much information. I can't seem to get it to behave otherwise and it doesn't fail for me locally (of course). If anyone has some free CPU cycles could you please do: mvn clean install -DskipTests && mvn verify -pl :gremlin-server -DskipTests -DskipIntegrationTests=false -DincludeNeo4j -DtestUnified=true i'm curious if anyone else can recreate the issue.....Thanks On Thu, Mar 25, 2021 at 3:33 PM Stephen Mallette <[email protected]> wrote: > It took me a long time to get to the final steps of this change. Uncovered > some unexpected bugs in the WsAndHttpChannelizer and that was gnarly to > sort out. Also found some problems with authorization through the > SaslAndHttpBasicAuthenticationHandler. I can't say I really like how these > combined channelizers really work. Seems like the new tangle of stuff in > Gremlin Server. Not sure how much of that I can sort that out for 3.5.0 at > this point. I might make some attempts on this body of work since it's > fresh in my mind but we'll see. > > After getting through these little nits I've managed to get all of the > Gremlin Server integration test cases passing with the new > UnifiedChannelizer. I've added a -DtestUnified property to run the > UnifiedChannelizer rather than the old stuff. This way we can add a new job > to travis to cover that by itself in parallel to the old. You can run it > like this: > > mvn verify -pl gremlin-server -DskipIntegrationTests=false > -DtestUnified=true > > Having all the tests pass on the new channelizer makes me feel pretty > confident that what I have at least works (and without much change in > behavior) and therefore have pushed a branch for early review: > > https://github.com/apache/tinkerpop/tree/TINKERPOP-2245 > > I still need to write a few more tests and after that it would be time to > do a bit of performance testing to see how it compares to the old method. > I'm also missing "metrics". If all that is good then I'll need to do a gang > of documentation work. I imagine I will continue to keep this work separate > from the bytecode transaction work that spawned it and merge the > transaction stuff first followed by this. In that way, expect this branch > to be a bit volatile with rebasing. > > > > > > On Tue, Mar 23, 2021 at 6:51 PM Stephen Mallette <[email protected]> > wrote: > >> More reasonable progress with the UnifiedChannelizer. The short version >> is that I have a pretty major quantity of the server integration tests >> passing under this new Channelizer. The ones that are failing are typically >> ones that assert some specific log messages or exception message which may >> not necessarily apply anymore. I'm still mulling those over. >> >> The basic structure of the UnifiedChannelizer is that it sorta copies the >> WsAndHttpChannelizer and builds from there, so when this UnfiedChannelizer >> is configured you'll get HTTP and Websockets. The UnifiedChannelizer gets >> rid of the entire OpProcessor infrastructure. I think that was a good body >> of code years ago, but we've now learned quite explicitly what we want >> Gremlin Server to do and what role it plays. OpProcessor just feels like an >> unnecessary abstraction now. Removing it simplifies the code of the >> UnifiedChannelizer, which adds a single UnifiedHandler to the netty >> pipeline. That UnifiedHandler streamlines all those OpProcessor >> implementations into one so all the duplicate code is effectively removed >> by running RequestMessage instances through a single common flow. That >> means all the request validation, error handling, iterator processing, >> result building, transaction management, etc. will be in one place. >> >> The UnifiedHandler also kills out the GremlinExecutor. That means all the >> Lifecycle callbacks and random lambdas that generated so much indirection >> in the name of extensibility are no longer confusing the code base and the >> wicked hierarchy of timesouts and exception throws are all flattened. >> >> When the UnifiedHandler gets a RequestMessage it validates it, builds a >> Context and creates a Rexster instance (thought i'd bring back the name >> rather than go with the generic "Worker" interface) that has the Context >> added to it as a task. The Rexster instance is then submitted to the >> Gremlin thread pool to do work when a thread is available. Rexster will >> hang on to that thread awaiting tasks that are assigned to it. For >> sessionless requests, that means Rexster handles one request and exits. For >> a sessionful request it means the Rexster will wait for more tasks (Context >> objects from new requests) to be added to it. The UnifiedHandler holds >> those Rexster instances in a Map aligning session identifiers to Rexster >> instances. As Divij alluded to, I suppose this means we are putting down >> foundation for query cancellation and better insight into the queue of >> things that are running. >> >> Now that the gremlinPool server setting covers both types of requests we >> no longer have a situation where the number of threads that can be running >> is unbounded and a "session" will now look a bit like a "long run job" in >> that it will consume a thread from the pool for as long as the session is >> open. So if the gremlinPool is 16, you could have 16 sessions active. >> Sessionless requests would begin to queue as would requests for new >> sessions and they would be serviced in the order received. >> >> I'm still not quite ready for a pull request. Despite having the bulk of >> the integration tests working, the code isn't quite the way I think it will >> be finally organized (I used a lot of inner classes and things which >> probably should change). I also need to figure out how to run the tests. I >> think I need to run them all per channelizer (old and new) but I think that >> will push the build time over maximum on Travis, so I may need to break >> things up somehow. I'm reasonably sure I'll have a branch ready to push >> before the end of the week. >> >> >> >> >> >> On Mon, Mar 22, 2021 at 5:22 PM Stephen Mallette <[email protected]> >> wrote: >> >>> I've started to play with some ideas for pushing all requests >>> scripts/bytecode sessions/sessionless down one unified pipeline with one >>> shared thread pool. The most interesting thing I've seen so far is how much >>> the code could potentially simplify under this approach. The >>> GremlinExecutor was the wrong abstraction in a sense and it forced way too >>> many callbacks as arguments and made error handling kinda sloppy which made >>> the server code harder to reuse, extend and read. Of course, all that was >>> built before the first remote graph was built and we were just wildly >>> guessing at how DS Graph was going to work with all this. Now, I think the >>> remote model is much more clear, so maybe all this can be made "right" now. >>> >>> I think that if I can see any success at all with this in the next few >>> days, we could include a new UnfiiedChannelizer that would replace the >>> existing ones. I think we'd keep the old one in place by default and keep >>> my copy/paste work for SessionOpProcessor that added bytecode there, with >>> the idea that the UnifiedChannelizer will become the future default as we >>> get it tested further along the 3.5.x line. >>> >>> I imagine I'll have more details on this task tomorrow and will post >>> back here when I do. >>> >>> >>> On Thu, Mar 18, 2021 at 12:37 PM Divij Vaidya <[email protected]> >>> wrote: >>> >>>> > >>>> > I suppose the >>>> > question is how would we ensure that each request for a session ends >>>> up >>>> > being executed on the same thread from the previous request if jobs >>>> are >>>> > randomly submitted to a worker pool? >>>> >>>> >>>> I haven't thought through the details, but on top of my head, we would >>>> have >>>> to maintain some request<->thread mapping on the server. This mapping is >>>> also a building block for a request cancellation feature in future >>>> where a >>>> client would be able to send a cancellation request to the server, the >>>> server will map the request to a thread executing that request and then >>>> set >>>> an interrupt on that thread to signify cancellation. >>>> >>>> Divij Vaidya >>>> >>>> >>>> >>>> On Thu, Mar 18, 2021 at 5:00 PM Stephen Mallette <[email protected]> >>>> wrote: >>>> >>>> > I started a fresh thread for this topic Divij brought up, with more >>>> > context: >>>> > >>>> > > In a scenario where we have both >>>> > > session-less and sesion-ful requests being made to the server, the >>>> > resource >>>> > > allocation will not be fair and may lead to starvation for some >>>> requests. >>>> > > This problem exists even today, hence not totally correlated to >>>> what you >>>> > > are proposing but I am afraid a wider adoption of explicit >>>> > > transactions will bring this problem to the spotlight. Hence, we >>>> should >>>> > fix >>>> > > this problem first. A solution would entail converging the worker >>>> pool >>>> > for >>>> > > both session vs session-less requests. >>>> > >>>> > I'm not sure we can get that done in 3.5.0, but maybe? I suppose the >>>> > question is how would we ensure that each request for a session ends >>>> up >>>> > being executed on the same thread from the previous request if jobs >>>> are >>>> > randomly submitted to a worker pool? >>>> > >>>> > >>>> > On Thu, Mar 18, 2021 at 11:52 AM Divij Vaidya < >>>> [email protected]> >>>> > wrote: >>>> > >>>> > > Great idea Stephen. I agree with standardizing the explicit >>>> transaction >>>> > > semantics in cases of remote server (implicit aka sessionless is >>>> already >>>> > > self explanatory) and unifying the syntax with embedded graph usage. >>>> > > >>>> > > Couple of notes: >>>> > > >>>> > > 1. I would favor the begin() instead of create() as it's closer to >>>> > > universally prominent SQL transaction syntax. This would reduce the >>>> > > learning curve for users of TP. >>>> > > >>>> > > 2. From an implementation standpoint, sessionless requests are >>>> queued on >>>> > > the server side and are serviced by the worker thread pool. But a >>>> > > session-ful aka explicit transaction aka managed transaction starts >>>> a new >>>> > > single worked thread pool every time. In a scenario where we have >>>> both >>>> > > session-less and sesion-ful requests being made to the server, the >>>> > resource >>>> > > allocation will not be fair and may lead to starvation for some >>>> requests. >>>> > > This problem exists even today, hence not totally correlated to >>>> what you >>>> > > are proposing but I am afraid a wider adoption of explicit >>>> > > transactions will bring this problem to the spotlight. Hence, we >>>> should >>>> > fix >>>> > > this problem first. A solution would entail converging the worker >>>> pool >>>> > for >>>> > > both session vs session-less requests. >>>> > > >>>> > > 3. You are proposing the idea of having a transaction scoped >>>> traversal >>>> > > object. I agree with it but we need more clarification in behavior >>>> wrt >>>> > the >>>> > > following scenarios: >>>> > > >>>> > > Q. What happens when g.tx().commit() is called on a transaction >>>> scoped >>>> > > traversal object? Does the traversal get closed? >>>> > > Q. Currently, the same traversal object could be used to execute >>>> multiple >>>> > > session-less requests simultaneously in a thread safe manner. Does >>>> the >>>> > same >>>> > > behaviour apply to transaction scoped traversal? If not, then how >>>> do I >>>> > send >>>> > > multiple requests in parallel from the client all scoped to the same >>>> > > transaction on the server? Note that this is different from case of >>>> multi >>>> > > threaded transactions because on the server all requests are scoped >>>> to >>>> > > single transaction i.e. single thread but on the client they may be >>>> > > submitted via multiple threads. >>>> > > >>>> > > Regards, >>>> > > Divij Vaidya >>>> > > >>>> > > >>>> > > >>>> > > On Thu, Mar 18, 2021 at 4:05 PM Stephen Mallette < >>>> [email protected]> >>>> > > wrote: >>>> > > >>>> > > > The Transaction object is a bit important with remote cases >>>> because it >>>> > > > holds the reference to the session. With embedded use cases we >>>> > generally >>>> > > > adhere to ThreadLocal transactions so the Transaction >>>> implementation in >>>> > > > that case is more of a controller for the current thread but for >>>> remote >>>> > > > cases the Transaction implementation holds a bit of state that can >>>> > cross >>>> > > > over threads. In some ways, that makes remote cases feel like a >>>> > "threaded >>>> > > > transaction" so that may be familiar to users?? Here's some >>>> example >>>> > > > syntax I currently have working in a test case: >>>> > > > >>>> > > > g = traversal().withRemote(conn) >>>> > > > gtx = g.tx().create() >>>> > > > assert gtx.isOpen() == true >>>> > > > gtx.addV('person').iterate() >>>> > > > gtx.addV('software').iterate() >>>> > > > gtx.commit() // alternatively you could explicitly rollback() >>>> > > > assert gtx.isOpen() == false >>>> > > > >>>> > > > I hope that documentation changes are enough to unify the syntax >>>> and >>>> > > remove >>>> > > > confusion despite there still being ThreadLocal transactions as a >>>> > default >>>> > > > for embedded cases and something else for remote. At least they >>>> will >>>> > look >>>> > > > the same, even if you technically don't need to do a >>>> g.tx().create() >>>> > for >>>> > > > embedded transactions and can just have the control you always >>>> had with >>>> > > > g.tx().commit() directly. >>>> > > > >>>> > > > Note that the remote Transaction object can support configuration >>>> via >>>> > > > onClose(CLOSE_BEHAVIOR) and you could therefore switch from the >>>> default >>>> > > of >>>> > > > "commit on close" to rollback or manual (or i suppose something >>>> > custom). >>>> > > > It's nice that this piece works. I don't see a point in supporting >>>> > > > onReadWrite(READ_WRITE_BEHAVIOR) as it just doesn't make sense in >>>> > remote >>>> > > > contexts. >>>> > > > >>>> > > > Another point is that I've added what I termed a "Graph Operation" >>>> > > bytecode >>>> > > > instances which is bytecode that isn't related to traversal >>>> > > construction. I >>>> > > > hope this isn't one of those things where we end up wanting to >>>> > deprecate >>>> > > an >>>> > > > idea as fast as we added it but we needed some way to pass >>>> > > commit/rollback >>>> > > > commands and they aren't really part of traversal construction. >>>> They >>>> > are >>>> > > > operations that execute on the graph itself and I couldn't think >>>> of how >>>> > > to >>>> > > > treat them as traversals nicely, so they sorta just exist as a >>>> one off. >>>> > > > Perhaps they will grow in number?? Folks have always asked if >>>> bytecode >>>> > > > requests could get "GraphFeature" information - I suppose this >>>> could >>>> > be a >>>> > > > way to do that?? >>>> > > > >>>> > > > Anyway, I will keep going down this general path as it's appearing >>>> > > > relatively fruitful. >>>> > > > >>>> > > > >>>> > > > >>>> > > > >>>> > > > On Thu, Mar 18, 2021 at 6:34 AM Stephen Mallette < >>>> [email protected] >>>> > > >>>> > > > wrote: >>>> > > > >>>> > > > > > We should just communicate clearly that a simple remote >>>> traversal >>>> > > > > already uses a transaction by default, >>>> > > > > >>>> > > > > That's a good point because even with this change we still have >>>> a >>>> > > > > situation where a single iterated remote traversal will >>>> generally >>>> > mean >>>> > > a >>>> > > > > server managed commit/rollback, but in embedded mode, we have >>>> an open >>>> > > > > transaction (for transaction enabled graphs - TinkerGraph will >>>> behave >>>> > > > more >>>> > > > > like a remote traversal, so more confusion there i guess). I'm >>>> not >>>> > sure >>>> > > > how >>>> > > > > to rectify that at this time except by way of documentation. >>>> > > > > >>>> > > > > The only thing I can think of is that the default for embedded >>>> would >>>> > > have >>>> > > > > to be auto-commit per traversal. In that way it would work like >>>> > remote >>>> > > > > traversals and for graphs that don't support transactions like >>>> > > > TinkerGraph. >>>> > > > > Of course, that's the kind of change that will break a lot of >>>> code. >>>> > > Maybe >>>> > > > > we just keep that idea for another version. >>>> > > > > >>>> > > > > On Thu, Mar 18, 2021 at 4:21 AM <[email protected]> wrote: >>>> > > > > >>>> > > > >> I like the idea of adding transactions for remote traversals >>>> as they >>>> > > > >> close a gap in functionality that we currently have for remote >>>> > > > traversals. >>>> > > > >> We should just communicate clearly that a simple remote >>>> traversal >>>> > > > already >>>> > > > >> uses a transaction by default, meaning that the server will >>>> roll >>>> > back >>>> > > > any >>>> > > > >> changes if any exception occurs. Users often ask about how to >>>> do >>>> > > > >> transactions with a remote traversal because they don't know >>>> about >>>> > > this >>>> > > > and >>>> > > > >> I'm afraid that we might add even more confusion if we now add >>>> > > > transactions >>>> > > > >> for remote traversals. Hence why I think that we should >>>> communicate >>>> > > this >>>> > > > >> clearly when we add remote transactions. >>>> > > > >> >>>> > > > >> Getting this to work in the GLVs should be possible but will >>>> require >>>> > > > some >>>> > > > >> effort. I think we would have to introduce some >>>> > > > >> "DriverRemoteTransactionTraversal" that doesn't submit the >>>> traversal >>>> > > on >>>> > > > a >>>> > > > >> terminal step but saves it and then submits all saved >>>> traversals >>>> > > > together >>>> > > > >> on close(). >>>> > > > >> This also means that we should directly add an async version of >>>> > > close(), >>>> > > > >> maybe closeAsync()? (Same for commit() and rollback()) >>>> > > > >> >>>> > > > >> I also like create() better than traversal() because it would >>>> > confuse >>>> > > me >>>> > > > >> to first start a traversal with traversal() and then also >>>> start a >>>> > > > >> transaction with the same method. >>>> > > > >> >>>> > > > >> -----Ursprüngliche Nachricht----- >>>> > > > >> Von: Stephen Mallette <[email protected]> >>>> > > > >> Gesendet: Donnerstag, 18. März 2021 00:01 >>>> > > > >> An: [email protected] >>>> > > > >> Betreff: Re: [DSISCUSS] Remote Transactions >>>> > > > >> >>>> > > > >> One thing I should have noted about tx().create(). The >>>> create() very >>>> > > > well >>>> > > > >> could have been named traversal(), thus the previous example >>>> reading >>>> > > as: >>>> > > > >> >>>> > > > >> g = traversal().withEmbedded(graph) >>>> > > > >> // or >>>> > > > >> g = traversal().withRemote(conn) >>>> > > > >> gtx = g.tx().traversal() >>>> > > > >> gtx.addV('person').iterate() >>>> > > > >> gtx.addV('software').iterate() >>>> > > > >> gtx.close() // alternatively you could explicitly commit() or >>>> > > rollback() >>>> > > > >> >>>> > > > >> You're basically spawning a new GraphTraversalSource from the >>>> > > > Transaction >>>> > > > >> object rather than from a Graph or AnonymousTraversal. I chose >>>> > > create() >>>> > > > >> because it felt like this looked weird: >>>> > > > >> >>>> > > > >> g = traversal().withRemote(conn).tx().traversal() >>>> > > > >> >>>> > > > >> which would be a weird thing to do I guess, but just seeing >>>> > > > "traversal()" >>>> > > > >> over and over seemed odd looking and I wanted to differentiate >>>> with >>>> > > the >>>> > > > >> Transaction object even though the methods are identical in >>>> what >>>> > they >>>> > > > do. I >>>> > > > >> suppose I also drew inspiration from: >>>> > > > >> >>>> > > > >> Transaction.createdThreadedTx() >>>> > > > >> >>>> > > > >> which I think we might consider deprecating. JanusGraph would >>>> simply >>>> > > > >> expose their own Transaction object in the future that has that >>>> > method >>>> > > > as I >>>> > > > >> imagine folks still use that feature. As far as I know, no >>>> other >>>> > graph >>>> > > > >> implements that functionality and my guess is that no one will >>>> > likely >>>> > > > do so >>>> > > > >> in the future. >>>> > > > >> >>>> > > > >> anyway, if you like traversal() better than create() or the >>>> other >>>> > way >>>> > > > >> around, please let me know >>>> > > > >> >>>> > > > >> On Wed, Mar 17, 2021 at 5:33 PM David Bechberger < >>>> > [email protected] >>>> > > > >>>> > > > >> wrote: >>>> > > > >> >>>> > > > >> > I am in favor of this change as I any idea that unifies the >>>> > multiple >>>> > > > >> > different ways things work in TP will only make it easier to >>>> learn >>>> > > and >>>> > > > >> > adopt. >>>> > > > >> > >>>> > > > >> > I don't know that I have enough knowledge of the inner >>>> workings of >>>> > > > >> > transactions to know what/if this will cause problems. >>>> > > > >> > >>>> > > > >> > Dave >>>> > > > >> > >>>> > > > >> > On Wed, Mar 17, 2021 at 12:57 PM Stephen Mallette >>>> > > > >> > <[email protected]> >>>> > > > >> > wrote: >>>> > > > >> > >>>> > > > >> > > We haven't touched "transactions" since TP3 was originally >>>> > > designed. >>>> > > > >> > > They have remained a feature for embedded use cases even >>>> in the >>>> > > face >>>> > > > >> > > of the >>>> > > > >> > rise >>>> > > > >> > > of remote graph use cases and result in major >>>> inconsistencies >>>> > that >>>> > > > >> > > really bother users. >>>> > > > >> > > >>>> > > > >> > > As we close on 3.5.0, I figured that perhaps there was a >>>> chance >>>> > to >>>> > > > >> > > do something radical about transactions. Basically, I'd >>>> like >>>> > > > >> > > transactions to work irrespective of remote or embedded >>>> usage >>>> > and >>>> > > > >> > > for them to have the >>>> > > > >> > same >>>> > > > >> > > API when doing so. In mulling it over for the last day or >>>> so I >>>> > > had a >>>> > > > >> > > realization that makes me believe that the following is >>>> > possible: >>>> > > > >> > > >>>> > > > >> > > g = traversal().withEmbedded(graph) >>>> > > > >> > > // or >>>> > > > >> > > g = traversal().withRemote(conn) >>>> > > > >> > > gtx = g.tx().create() >>>> > > > >> > > gtx.addV('person').iterate() >>>> > > > >> > > gtx.addV('software').iterate() >>>> > > > >> > > gtx.close() // alternatively you could explicitly commit() >>>> or >>>> > > > >> > > rollback() >>>> > > > >> > > >>>> > > > >> > > // you could still use g for sessionless, but gtx is "done" >>>> > after >>>> > > // >>>> > > > >> > > you close so you will need to create() a new gtx instance >>>> for a >>>> > // >>>> > > > >> > > fresh transaction assert 2 == g.V().count().next() >>>> > > > >> > > >>>> > > > >> > > Note that the create() method on tx() is the new bit of >>>> > necessary >>>> > > > >> syntax. >>>> > > > >> > > Technically, it is a do-nothing for embedded mode (and you >>>> could >>>> > > > >> > > skip it for thread-local auto-transactions) but all the >>>> > > > >> > > documentation can be shifted so that the API is identical >>>> to >>>> > > remote. >>>> > > > >> > > The change would be non-breaking as the embedded >>>> transaction >>>> > > > >> > > approach would remain as it is, but would no longer be >>>> > documented >>>> > > as >>>> > > > >> > > the preferred approach. Perhaps we could one day disallow >>>> it but >>>> > > > >> > > it's a bit of a tangle of ThreadLocal that I'm not sure we >>>> want >>>> > to >>>> > > > >> touch in TP3. >>>> > > > >> > > >>>> > > > >> > > What happens behind the scenes of g.tx().create() is that >>>> in >>>> > > remote >>>> > > > >> > > cases the RemoteConnection constructs a remote Transaction >>>> > > > >> > > implementation which basically is used to spawn new >>>> traversal >>>> > > source >>>> > > > >> > > instances with a session based connections. The remote >>>> > Transaction >>>> > > > >> > > object can't support >>>> > > > >> > transaction >>>> > > > >> > > listeners or special read-write/close configurations, but I >>>> > don't >>>> > > > >> > > think that's a problem as those things don't make a lot of >>>> sense >>>> > > in >>>> > > > >> > > remote use cases. >>>> > > > >> > > >>>> > > > >> > > From the server perspective, this change would also mean >>>> that >>>> > > > >> > > sessions would have to accept bytecode and not just >>>> scripts, >>>> > which >>>> > > > >> > > technically shouldn't be a problem. >>>> > > > >> > > >>>> > > > >> > > One downside at the moment is that i'm thinking mostly in >>>> Java >>>> > at >>>> > > > >> > > the moment. I'm not sure how this all works in other >>>> variants >>>> > just >>>> > > > >> > > yet, but >>>> > > > >> > I'd >>>> > > > >> > > hope to keep similar syntax. >>>> > > > >> > > >>>> > > > >> > > I will keep experimenting tomorrow. If you have any >>>> thoughts on >>>> > > the >>>> > > > >> > matter, >>>> > > > >> > > I'd be happy to hear them. Thanks! >>>> > > > >> > > >>>> > > > >> > >>>> > > > >> >>>> > > > >> >>>> > > > >>>> > > >>>> > >>>> >>>
