[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16081972#comment-16081972 ] Goldstein Lyor commented on SSHD-754: - Looks like a good idea - I believe though that the better solution would be to implement such a "throttling" at the *channel* level instead of the *session* - after all, we might want to control each channel separately. I.e., {{ChannelOutputStream/ChannelAsyncOutputStream - write/flush}}. Furthermore, the "throttle" rate should be *configurable* - where zero (default) means "no throttling". > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079673#comment-16079673 ] Goldstein Lyor commented on SSHD-754: - I have not had time to really think about the scenario you presented, and nowadays it will take me a while to find some free time to review it. Please feel free to go ahead and publish a pull-request once you think you have a proposal in mind - it will be easier for us to review it and better understand the problem and its possible fix. That being said, off the top of my head - I believe one can control the window size via properties and/or window-size SSH messages (don't remember offhand the exact message exchange). AFAIK, the window size mechanism provides a way to throttle the data rate, so maybe that can be a way to address this issue as well. > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079668#comment-16079668 ] Eugene Petrenko commented on SSHD-754: -- It is indeed a non-trivial problem that I have discovered. First, I want to make sure the proposed fix is good enough to be converted into a pull request. The problem I have is that on the SSH server: 1) channel remote window is set to be 2GB 2) client is slow to receive data (data is generated faster that it is being consumed by network/client) 2a) Slow network connection 2b) Rekey is running Because of 1) and 2) we have OOM - too many data chunks are queued. The proposed solution is to limit send queue only for DATA and EXTENDED_DATA messages (is that correct?), by blocking too active sender (deadlock is possible if we block a NIO/callback thread) Alternative solution can be to implement similar logic in org.apache.sshd.common.channel.ChannelOutputStream and org.apache.sshd.common.channel.ChannelAsyncOutputStream. Also, there might be other usages of Channel/Session, thus fixing those 2 classes may not be enough and trick to avoid code duplication What would you say? > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16079638#comment-16079638 ] Goldstein Lyor commented on SSHD-754: - If indeed you have a fix / improvement in mind, a pull-request is the best way to have it evaluated and eventually merged... > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075469#comment-16075469 ] Eugene Petrenko commented on SSHD-754: -- After reproducing the problem in tests I was able to come up with the following patch (in Kotlin) for an inheritor of ServerSessionImpl to fix the test {code} private class PressureLock { private val semaphore = Semaphore(100) fun acquire() : SshFutureListener{ semaphore.acquire() return listener } private val listener = object : SshFutureListener { override fun operationComplete(future: IoWriteFuture?) { semaphore.release() } } } private val CHANNEL_STDOUT_LOCK = PressureLock() private val CHANNEL_STDERR_LOCK = PressureLock() override fun writePacket(buffer: Buffer): IoWriteFuture { // The workaround for VCS-797 // and https://issues.apache.org/jira/browse/SSHD-754 // the trick is to block writer thread once there are more // than 100 messages in either rekey wait queue or nio write queue val lock = when (buffer.array()[buffer.rpos()]) { SshConstants.SSH_MSG_CHANNEL_DATA -> CHANNEL_STDOUT_LOCK SshConstants.SSH_MSG_CHANNEL_EXTENDED_DATA -> CHANNEL_STDERR_LOCK else -> null }?.acquire() val future = super.writePacket(buffer) if (lock != null) { future.addListener(lock) } return future } } {code} > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16075341#comment-16075341 ] Eugene Petrenko commented on SSHD-754: -- It looks like starting from April 2016, plink enables 'simple' mode for SSH. It was done by commit b22c0b6f3e6f5254270a89f86df3edfc4da829d2 https://git.tartarus.org/?p=simon/putty.git;a=commit;h=b22c0b6f3e6f5254270a89f86df3edfc4da829d2. Starting from 0.68 it sends 2GB (0x7fff) as receive window size. That makes SSHD library to be vulnerable for OOM. The simplified STR is as follows. We need a command that returns huge portion of data, say bigger than heap size. Next, it is only enough to have a client which is slow to read data (e.g. slow channel), the server will easily queue to many packets and it will have OOM out of that. Looks like the org.apache.sshd.common.channel.ChannelOutputStream and similar classes should take into account not only the window size but also it's own write queue. > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (SSHD-754) OOM in sending data for channel
[ https://issues.apache.org/jira/browse/SSHD-754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16074753#comment-16074753 ] Eugene Petrenko commented on SSHD-754: -- Did a try with 1.4.0, where SSHD-701 is closed. Same issue. I have enormous remote windows size, so anything fits into the window, generating endless pending write queue. > OOM in sending data for channel > --- > > Key: SSHD-754 > URL: https://issues.apache.org/jira/browse/SSHD-754 > Project: MINA SSHD > Issue Type: Bug >Affects Versions: 1.1.0 >Reporter: Eugene Petrenko > > I have an implementation of SSHD server with the library. It sends gigabytes > (e.g. 5GB) of data as command output. > Starting from Putty plink 0.68 (also includes plink 0.69) we started to have > OOM errors. Checking memory dumps shown the most of the memory is consumed > from the function > org.apache.sshd.common.session.AbstractSession#writePacket(org.apache.sshd.common.util.buffer.Buffer) > In the hprof I see thousands of PendingWriteFuture objects (btw, each holds a > reference to a logger instance). And those objects are only created from this > function. > It is clear the session is running through rekey. I see the kexState > indicating the progress. > Is there a way to artificially limit the sending queue, no matter if related > remote window allows sending that enormous amount of data? As of my > estimation, the window was reported to be around 1.5 GB or more. Maybe, such > huge window size was caused by an arithmetic overflow that is fixed on > SSHD-701 -- This message was sent by Atlassian JIRA (v6.4.14#64029)