[jira] [Commented] (TINKERPOP-2019) Gremlin.Net.Driver.WebSocketConnection throws System.InvalidOperationException

Florian Hockmann (Jira) Wed, 08 Apr 2020 08:38:24 -0700


    [ 
https://issues.apache.org/jira/browse/TINKERPOP-2019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078396#comment-17078396
 ]


Florian Hockmann commented on TINKERPOP-2019:
---------------------------------------------

I looked into this some more as I wanted to create an issue in the {{dotnet}} 
repo. After looking into [the source code of the {{ManagedWebSocket}} 
class|https://github.com/dotnet/runtime/blob/master/src/libraries/Common/src/System/Net/WebSockets/ManagedWebSocket.cs],
 I think that this could actually not be a problem of calling {{ReceiveAsync}} 
two times in parallel, but maybe that we call {{CloseAsync}} while we still 
have a {{ReceiveAsync}} that hasn't completed yet or vice versa. {{CloseAsync}} 
also calls internally {{ReceiveAsyncPrivate}} to receive a close response from 
the server.
 {{ManagedWebSocket}} shouldn't have a problem in general when {{CloseAsync}} 
is called when there is still an unfinished {{ReceiveAsync}} operation in 
progress as that [was an issue that has been fixed in 2016 
already|https://github.com/dotnet/runtime/issues/17819].
 But I wonder whether we could see some kind of race condition here where the 
checks that were added for the mentioned issue succeed although the task is not 
fully completed yet or something like that. The fact that you needed to perform 
4,5 M calls under bad network conditions, if I understood that correctly, 
[~dzmitry.lahoda], until the exception occurred, could also be seen as a hint 
that we have a rare race condition here.

I see two options to proceed further here. We could either:
 * try whether we can simply fix this by cancelling any pending operations on 
the {{ClientWebSocket}} in our {{WebSocketConnection}} class with a 
{{CancellationTokenSource}} before calling {{CloseAsync}} or
 * try to reproduce this problem with a minimalistic example outside of 
Gremlin.Net and then use that example to create an issue in the {{dotnet}} repo.

I think that we need a minimalistic example that still reproduces this 
exception to get meaningful help from the {{dotnet}} team as they otherwise 
cannot be sure that the error is not in our usage of {{ClientWebSocket}} and it 
would of course also rule out that possibility for us.

Using a {{CancellationTokenSource}} to cancel any operations before calling 
{{CloseAsync}} is probably a good idea in general, but it would of course be 
good to know whether that already solves the problem.

I'll try to follow up on this and try out one or both approaches, but I'm not 
sure yet when I'll have the time for this. So, if anyone wants to take this up, 
then that would be greatly appreciated.

> Gremlin.Net.Driver.WebSocketConnection throws System.InvalidOperationException
> ------------------------------------------------------------------------------
>
>                 Key: TINKERPOP-2019
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-2019
>             Project: TinkerPop
>          Issue Type: Bug
>          Components: dotnet
>    Affects Versions: 3.3.3
>         Environment: Azure App Service
>            Reporter: Sami
>            Priority: Critical
>         Attachments: image-2020-02-21-05-32-58-730.png, 
> image-2020-02-21-05-33-27-246.png, invalid.txt
>
>
> We're getting the following {{System.InvalidOperationException}} error 
> message:
> {code:c#}
> "There is already one outstanding 'SendAsync' call for this WebSocket 
> instance. ReceiveAsync and SendAsync can be called simultaneously, but at 
> most one outstanding operation for each of them is allowed at the same time.
> Problem Id:
> System.InvalidOperationException at 
> Gremlin.Net.Driver.WebSocketConnection+<SendMessageAsync>d__5.MoveNext"{code}
>  
>  We get this exception sporadically and only a few times out of thousands. 
> Unfortunately we have not been able to reproduce it.
>   
>  I understand that when dealing with web sockets, it is allowed to have only 
> a single pending "send" or a single pending "receive".
>   
>  After looking at GitHub's WebSocketConnection class, I don't see any 
> orchestration between SendMessageAsync's {{_client.SendAsync}} (currently 
> line 54) and ReceiveMessageAsync's {{_client.ReceiveAsync}} (currently line 
> 66). 
>   
>  Reference Link: 
>  
> [https://github.com/apache/tinkerpop/blob/master/gremlin-dotnet/src/Gremlin.Net/Driver/WebSocketConnection.cs]
>   
>  I'm wondering if not having orchestration in the WebSocketConnection class 
> to keep the single pending "send" or a single pending "receive" rule may be 
> the cause. 
>   
>  In our .NET Core web api application, we create the GremlinConnection as a 
> singleton in Startup.cs and then have one central call that makes Gremlin 
> calls; i.e. it's a very straightforward implementation.
>   
>  Startup.cs:
> {code:c#}
> public void ConfigureServices(IServiceCollection services)
> {
>     //...other stuff removed for brevity
>     services.AddSingleton<IGremlinConnection, GremlinConnection>();
> }{code}
>  
>  Reader.cs:
> {code:c#}
> public async Task<IReadOnlyCollection<dynamic>> ExecuteGremlinQuery(string 
> query)
> {
>     try
>     {
>         return await _gremlinConnection.Client.SubmitAsync<dynamic>(query);
>     }
>     catch (Gremlin.Net.Driver.Exceptions.ResponseException responseException)
>     {
>         //our error handling removed for brevity!    
>     }
> }{code}
>   
>  We use the Gremlin.Net version 3.3.3 nuget package and the 
> Microsoft.NETCore.App SDK
>   
>  Would it be possible to identify if this is indeed a bug on Gremlin.NET? 
>  And if it is, any thoughts on a best-practice (temporary) work-around that 
> we can implement?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (TINKERPOP-2019) Gremlin.Net.Driver.WebSocketConnection throws System.InvalidOperationException

Reply via email to