As I mentioned in the PR, Are we sure the callback is only called on failure on connection layer? I wouldn't like PLC4X to kill the worker when communicating with a non-standard PLC hence one of our protocol layers firing an error while processing the data.
Chris Am 04.08.19, 20:26 schrieb "Julian Feinauer" <jfeina...@apache.org>: Hi all, so I found the cause and fixed it. In fact, when the connection aborted we ended up in a situation where we created a thread pool (worker pool for netty) which was never shutdown (as no channel was created where the handling was done with later on). I created the PR https://github.com/apache/plc4x/pull/76 to develop. If this PR gets accepted I suggest to create a bugfix release 0.4.1 as this is really an issue for us in production. Any concerns with this approach? Thanks! Julian On 2019/08/02 15:05:47, Julian Feinauer <j.feina...@pragmaticminds.de> wrote: > Hey, > > agree @cdutz... I am just running an example and it really seems like that. > So I'll try to finish a MWE and perhaps ask on the netty list : ) > > Julian > > Am 02.08.19, 16:58 schrieb "Christofer Dutz" <christofer.d...@c-ware.de>: > > Hi Julian, > > Well if I look into my sock drawer at home I think we might be leaking some socks ... I agree ... there are several single-socks in there ;-) > > But regarding netty ... yes it is absolutely possible we're not handling this correctly as the docs are quite extensive and I didn't bother reading all of them ;-) > > So perhaps we should read them or ask some Netty pro > > Chris > > Am 02.08.19, 16:50 schrieb "Julian Feinauer" <j.feina...@pragmaticminds.de>: > > Hi all, > > we observe a strange behavior in production. > We are still investigating the exact scenario and it’s a bit complex as we have many connections to many plcs and fire many requests through many different channels… > But what we observe is that we get the well known “too many open files” Exception ona linux server WHEN one of the plcs gets unreachable (pool will try many times to recreate the connection). > > I just checked the Codebase for a Second and I think we are handling the exceptions wrong (or not at all?). > If I understand it correctly from [1] (didn’t bother to check nettys doc as its rather poor) we should close the socket somewhere but we ALWAYS do super.exceptionCaught() which just propagates it upward in the channel hierarchy but seems to NEVER close it. > > Am I wrong with that? > > We try to get create a MWE which reproduces that behavior to check if we fix it like that. > > Best > Julian > > [1] https://www.baeldung.com/netty-exception-handling > > > > >