Hi all, so I found the cause and fixed it. In fact, when the connection aborted we ended up in a situation where we created a thread pool (worker pool for netty) which was never shutdown (as no channel was created where the handling was done with later on). I created the PR https://github.com/apache/plc4x/pull/76 to develop.
If this PR gets accepted I suggest to create a bugfix release 0.4.1 as this is really an issue for us in production. Any concerns with this approach? Thanks! Julian On 2019/08/02 15:05:47, Julian Feinauer <j.feina...@pragmaticminds.de> wrote: > Hey, > > agree @cdutz... I am just running an example and it really seems like that. > So I'll try to finish a MWE and perhaps ask on the netty list : ) > > Julian > > Am 02.08.19, 16:58 schrieb "Christofer Dutz" <christofer.d...@c-ware.de>: > > Hi Julian, > > Well if I look into my sock drawer at home I think we might be leaking > some socks ... I agree ... there are several single-socks in there ;-) > > But regarding netty ... yes it is absolutely possible we're not handling > this correctly as the docs are quite extensive and I didn't bother reading > all of them ;-) > > So perhaps we should read them or ask some Netty pro > > Chris > > Am 02.08.19, 16:50 schrieb "Julian Feinauer" > <j.feina...@pragmaticminds.de>: > > Hi all, > > we observe a strange behavior in production. > We are still investigating the exact scenario and it’s a bit complex > as we have many connections to many plcs and fire many requests through many > different channels… > But what we observe is that we get the well known “too many open > files” Exception ona linux server WHEN one of the plcs gets unreachable (pool > will try many times to recreate the connection). > > I just checked the Codebase for a Second and I think we are handling > the exceptions wrong (or not at all?). > If I understand it correctly from [1] (didn’t bother to check nettys > doc as its rather poor) we should close the socket somewhere but we ALWAYS do > super.exceptionCaught() which just propagates it upward in the channel > hierarchy but seems to NEVER close it. > > Am I wrong with that? > > We try to get create a MWE which reproduces that behavior to check if > we fix it like that. > > Best > Julian > > [1] https://www.baeldung.com/netty-exception-handling > > > > >