Ok, disregard this thread. I think there is a problem with NetworkStream and I have a testcase, but I will start a new thread for this.

On 9/17/2016 16:11, psant...@codicesoftware.com wrote:
I'm no longer sure the issue is related to async now. I'm experiencing some weird behavior with sockets. I'll try with Mono 4.6 instead of 4.4 (if I manage to build it) and then retry.

Basically what I'm tracking now is this:

server (mono on raspberry): sends N bytes using a simple BufferedStream on top of NetworkStream on top of the TCP socket.

  2016-09-17 00:49:29,802 INFO  buffered-bug -1794771920 - DataPacketMessage.Serialize going to write: 1509839 bytes
  2016-09-17 00:49:30,777 INFO  buffered-bug -1794771920 - TrafficControlStream.Write 1509839 bytes
  2016-09-17 00:49:30,778 INFO  buffered-bug -1794771920 - DataPacketMessage.Serialize wrote: 1509839 bytes


client (.net on windows): reads => but it never finishes. The bytes to read is fine, but it seems the total never arrives.
  2016-09-17 02:49:17,100 INFO  buffered-bug - DataPacket going to read 1509839 bytes
  2016-09-17 02:49:17,100 INFO  buffered-bug - BinaryReader.SpecialRead going to read 1509839 bytes
  2016-09-17 02:49:17,100 INFO  buffered-bug - BinaryReader.SpecialRead read 1360/1509839 bytes
  2016-09-17 02:49:18,827 INFO  buffered-bug - BinaryReader.SpecialRead read 327760/1509839 bytes


It happens randomly in several threads. It might just be a bug on my code, although I can't reproduce it with the server on a PC (but maybe is some concurrency thing or some stupid mistake in my code, it must be).

I'm trying to isolate a test case.

When it freezes, there's no data queued on the server side (netstat).


Thanks,

pablo



On 9/15/2016 2:11, psant...@codicesoftware.com wrote:
Hi Ludovic,

Please find attached the log running as follows:

sudo MONO_LOG_MASK=io-threadpool MONO_LOG_LEVEL=debug mono plasticd.exe --console > /home/pablo/mono_thread_pool.log.txt

Last line written when the issue happens.

Thanks,

pablo

On 9/13/2016 15:25, Ludovic Henry wrote:

Hi Pablo,

If you want to have more logging for async io, please pass `MONO_LOG_MASK=io-threadpool MONO_LOG_LEVEL=debug` as environment variables. You can send us the output for your run, and we will take a look.

Also, on 4.6 the default backend for our async io is poll (not epoll or kqueue), but you can force their use by passing `MONO_ENABLE_AIO=1` as an environment variable.

 

Finally, maybe it’s an issue with the work item being enqueued on the threadpool, but no worker thread being spawned for it (it’s very unlikely). Could you try setting `ThreadPool.SetMinThreads(64, 64);` or something bigger than 64, it’s just to check if that’s it.

I hope it helps,

Ludovic

 

From: Mono-devel-list <mono-devel-list-boun...@lists.dot.net> on behalf of "psant...@codicesoftware.com" <psant...@codicesoftware.com>
Date: Tuesday, 13 September 2016 at 15:12
To: "mono-devel-list@lists.dot.net" <mono-devel-list@lists.dot.net>
Subject: Re: [Mono-dev] Potential issue with async sockets in Raspberry ARM – Mono 4.4.0

 

Hi Ivo,

Yes, we already checked the stack. I should have included it on my first email, sorry for that.

I didn't find anything strange, but I'm including it just in case I missed something.

Regarding the pending data in the receive queue: sure, I will rerun the scenario and check this.


== Stack trace when the server stops answering on a given socket ==

I'll add some comments to the stack so that you get more info about what is going on (in bold):

"Threadpool worker"
"<unnamed thread>"
"ActivitySchedulerThread"
"<unnamed thread>"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.IO.InotifyWatcher.ReadFromFD (intptr,byte[],intptr) <0x0003f> ==> this is just the log4net watcher, monitoring if the log config file changes.
  at System.IO.InotifyWatcher.Monitor () <0x0005f>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>

"<unnamed thread>"
"<unnamed thread>"
"<threadpool thread>"

This is our "ActivityScheduler thread" sleeping. Waiting to do some background task.

"Timer-Scheduler"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Threading.WaitHandle.WaitOne_internal (System.Threading.WaitHandle,intptr,int,bool) <0x00043>
  at System.Threading.WaitHandle.WaitOne (int,bool) <0x0008b>
  at System.Threading.WaitHandle.WaitOne (int) <0x0002b>
  at System.Threading.Timer/Scheduler.SchedulerThread () <0x00de7>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>


This must be the main server thread waiting on new connections (on Accept)

"Finalizer"
"Threadpool worker"
"<unnamed thread>"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Net.Sockets.Socket.Accept_internal (intptr,int&,bool) <0x0003f>
  at System.Net.Sockets.Socket.Accept_internal (System.Net.Sockets.SafeSocketHandle,int&,bool) <0x00043>
  at System.Net.Sockets.Socket.Accept () <0x00037>
  at System.Net.Sockets.TcpListener.AcceptSocket () <0x0002f>
  at PlasticPipe.Remoting.Tcp.SocketServerChannel.AcceptConnection () <0x00027>
  at PlasticPipe.Remoting.Tcp.SocketServerChannel.WaitForConnections () <0x0004f>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>



This must be the the other server thread waiting on new connections (on Accept) => we listen on 2 ports normally (regular tcp and ssl, new versions will use only one, but that's a different story)

"Threadpool worker"
"<unnamed thread>"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Net.Sockets.Socket.Accept_internal (intptr,int&,bool) <0x0003f>
  at System.Net.Sockets.Socket.Accept_internal (System.Net.Sockets.SafeSocketHandle,int&,bool) <0x00043>
  at System.Net.Sockets.Socket.Accept () <0x00037>
  at System.Net.Sockets.TcpListener.AcceptSocket () <0x0002f>
  at PlasticPipe.Remoting.Tcp.SocketServerChannel.AcceptConnection () <0x00027>
  at PlasticPipe.Remoting.Tcp.SocketServerChannel.WaitForConnections () <0x0004f>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>


This is another thread waiting on UDP requests. Nothing to do with the failing scenario. This is just here to wait for broadcasts requests from clients to discover all the available servers on the network. That's all.

"Threadpool worker"
"<unnamed thread>"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Net.Sockets.Socket.ReceiveFrom_internal (intptr,byte[],int,int,System.Net.Sockets.SocketFlags,System.Net.SocketAddress&,int&) <0x00057>
  at System.Net.Sockets.Socket.ReceiveFrom_internal (System.Net.Sockets.SafeSocketHandle,byte[],int,int,System.Net.Sockets.SocketFlags,System.Net.SocketAddress&,int&) <0x0007b>
  at System.Net.Sockets.Socket.ReceiveFrom_nochecks_exc (byte[],int,int,System.Net.Sockets.SocketFlags,System.Net.EndPoint&,bool,int&) <0x0008b>
  at System.Net.Sockets.Socket.ReceiveFrom (byte[],int,int,System.Net.Sockets.SocketFlags,System.Net.EndPoint&) <0x000a3>
  at System.Net.Sockets.Socket.ReceiveFrom (byte[],System.Net.EndPoint&) <0x0004f>
  at System.Net.Sockets.UdpClient.Receive (System.Net.IPEndPoint&) <0x0007f>
  at k.a () <0x00153>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>



I launched the test on Console. So this is just waiting on the Console for some key to be hit... regular server in daemon mode doesn't do that.

"<unnamed thread>"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.IO.MonoIO.Read (intptr,byte[],int,int,System.IO.MonoIOError&) <0x0004b>
  at System.IO.MonoIO.Read (System.Runtime.InteropServices.SafeHandle,byte[],int,int,System.IO.MonoIOError&) <0x0006f>
  at System.IO.FileStream.ReadData (System.Runtime.InteropServices.SafeHandle,byte[],int,int) <0x00047>
  at System.IO.FileStream.ReadInternal (byte[],int,int) <0x00087>
  at System.IO.FileStream.Read (byte[],int,int) <0x000ff>
  at System.IO.StreamReader.ReadBuffer () <0x0035f>
  at System.IO.StreamReader.Read () <0x0005f>
  at System.TermInfoDriver.ReadKeyInternal (bool&) <0x00113>
  at System.TermInfoDriver.ReadLine () <0x000a3>
  at System.ConsoleDriver.ReadLine () <0x0002b>
  at System.Console.ReadLine () <0x000df>
  at akn.b (Codice.CM.Server.SystemRunner,string) <0x00073>
  at akn.a (aas) <0x0064b>
  at akn.a (string[]) <0x000c7>
  at (wrapper runtime-invoke) <Module>.runtime_invoke_int_object (object,intptr,intptr,intptr) <0x000eb>

And this is the "audit logger" thread, nothing to do with the network.

"<threadpool thread>"
"Audit logger"  at <unknown> <0xffffffff>
  at (wrapper managed-to-native) System.Threading.WaitHandle.WaitOne_internal (System.Threading.WaitHandle,intptr,int,bool) <0x00043>
  at System.Threading.WaitHandle.WaitOne () <0x00063>
  at Codice.Audit.Log.a () <0x0016b>
  at System.Threading.ThreadHelper.ThreadStart_Context (object) <0x0009f>
  at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x001f7>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object,bool) <0x0002b>
  at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext,System.Threading.ContextCallback,object) <0x0005b>
  at System.Threading.ThreadHelper.ThreadStart () <0x0002f>
  at (wrapper runtime-invoke) object.runtime_invoke_void__this__ (object,intptr,intptr,intptr) <0x000c7>


I don't see any evidence of any thread being awaked to serve the request.


I suspect it MUST be an issue with our code, although we are running on production on Linux on PCs without issues (for years). But there might be something I'm missing.

Thanks,

pablo


On 9/13/2016 9:10, Ivo Smits wrote:

Hi Pablo,

You can send a SIGQUIT signal to mono (killall -SIGQUIT mono or ctrl+\ might work as well) to get a stacktraces for all live threads. This might show possible problems with threadpool exhaustion or other blocking calls.

Have you checked with netstat if there's data in de receive queue?

Ivo

Op 13-9-2016 om 0:45 schreef psant...@codicesoftware.com:

Hi Alan,

I'll try to create a small test case. We are experiencing this running a Plastic SCM server on raspberry, and a client on a different machine. It will take me a few hours to isolate a test case taking just the required parts from the plastic network protocol.

Meanwhile: do you have any suggestion to try to check whether my assumption is correct? I mean, I suspect somehow the BeginReceive never awakes. I guess it is using epoll down the stairs. Any log or something I can build to make sure the issue is on my code and not on Mono?

Thanks,

pablo

On 9/12/2016 11:37, Alan McGovern wrote:

Can you share the code to repro the issue?

Sent from my iPhone


On 12 Sep 2016, at 09:39, "psant...@codicesoftware.com" <psant...@codicesoftware.com> wrote:

Thanks for the hint Chris,

We are actually enqueuing the request and doing the actual read on a different threadpool.

That being said, we are reproducing this issue with a single client, I mean, single thread attending on the server. Not sure what can go wrong, tcp connection is still there, client can send, it simply looks like the server never wakes up to attend it, randomly.

Any other hints would be appreciated.

Thanks,

pablo

On 9/9/2016 18:11, Chris Swiedler wrote:

From what I understand it’s dangerous to do blocking reads on the callback from something like BeginReceive. The callback will happen on a threadpool thread, and the blocking reads could then cause the threadpool to be exhausted. I don’t know if that’s causing your specific problem (it could, if the reads really do block and you have enough of them) but it’s something to watch out for.

 

chris

 

From: Mono-devel-list [mailto:mono-devel-list-boun...@lists.dot.net] On Behalf Of psant...@codicesoftware.com
Sent: Friday, September 09, 2016 2:42 AM
To: mono-devel-list@lists.dot.net
Cc: dude <rdea...@codicesoftware.com>
Subject: [Mono-dev] Potential issue with async sockets in Raspberry ARM – Mono 4.4.0

 

Hi,

I’m posting here hoping someone can throw some light into the problem. I'm a little bit lost now.


Our server code (Plastic SCM) running on Raspberry on Mono 4.4.0 (exact version below) ends up not awaking from socket.BeginReceive after a while.

I mean, client connects and requests data in 4MB chunks, and depending on the run, it can transfer a few GB but it ends up not awaking. The client just sits waiting on a “socket recv” but the server doesn’t answer. Connection is established (can be checked at OS level).

The code could be simplified as follows:

            mSocket.BeginReceive(buffer, 0, 0, SocketFlags.None,
                OnMessageReceived, null);

_OnMessageReceived_ => does the EndReceive and then reads data and enqueues the request for a threadpool to attend it. Once the request is attented and the response sent, BeginReceive is invoked again. Important: all “BeginReceive()” calls are done from the same thread which NEVER dies.


All we use the BeginReceive for is to decouple socket from thread, so we do not have a 1-1. You see we do pass “zero” as bytes to read, because all we want to achieve is to get awakened when new data is received, then just read using blocking calls (no async).

I’m asking if there could be something about Mono because I read 4.6 release notes (http://www.mono-project.com/docs/about-mono/releases/4.6.0/) and the “atomic fixes for ARM64”. Could it be related somehow.

The same code runs on Windows and Linux PCs (even Macs) without issues. We use the same code on all our production servers and even our Cloud roles, and we are not aware of issues.
Now we are testing a new faster storage and using Raspberry to check how fast it goes on slower hardware.

Complete Mono version:
Mono JIT compiler version 4.4.0 (tarball Tue Jun 14 13:44:06 UTC 2016)
Copyright (C) 2002-2014 Novell, Inc, Xamarin Inc and Contributors. www.mono-project.com
        TLS:           __thread
        SIGSEGV:       normal
        Notifications: epoll
        Architecture:  armel,vfp+hard
        Disabled:      none
        Misc:          softdebug
        LLVM:          supported, not enabled.
        GC:            sgen
.

Thanks!

Pablo Santos
www.plasticscm.com

 

_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list





_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list





_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list






_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list



_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list


_______________________________________________
Mono-devel-list mailing list
Mono-devel-list@lists.dot.net
http://lists.dot.net/mailman/listinfo/mono-devel-list

Reply via email to