SFTP perf

Nicolas Williams Mon, 18 Feb 2008 02:05:06 -0800

On Mon, Feb 18, 2008 at 05:38:54PM +0800, Kacheong Poon wrote:
> Nicolas Williams wrote:
> > Two points.  First, I imagine that fixing this so that the app can
> > reduce SO_RCVBUF should not be nearly as hard as adding TCP auto-tuning
> > to Solaris.
> 
> There is another problem to fine tune the TCP receive window.
> The problem is that TCP uses 16 bits to represent the receive
> window.  For window larger than 64KB, TCP uses a scaling factor.
> But this factor is negotiated at the start of the connection.
> So your app needs to start with the "max" (according to your
> app) buffer size so that the correct scaling factor is negotiated
> at connection set up time.


I see.  Well, we can always start with a very large SO_RCVBUF and to
hell with tuning TCP.  My only concern with that is that this may
reserver a large amount of memory, but the buffers should only ever get
really big in large delay, WAN situations.

> > Second, we don't have to fix this anyways: since SSHv2 has its own
> > flow-control mechanisms we can still throttle the sender at the SSHv2
> > layer, even if we can't shrink SO_RCVBUF.
> > 
> > As long as we're able to open the throttle on TCP by enlarging the
> > SO_SND/RCVBUFs _and_ we can still slow down senders (we can) then we
> > should be OK, with some wasted memory IF the TCP stack reserves as much
> > memory as requested via SO_SND/RCVBUF.
> 
> Since the app will not see packet drop, the only possible
> indicator of congestion is the RTT of ssh data.  But this can

*Exactly*.  ssh/sshd could track the running average of RTTs over two
different time periods, and when the short-term average is smaller than
the long-term then available bandwidth is growing and with it's higher
we have congestion.

> be misleading some time.  For example, TCP can recover from
> data lost without a timeout.  From the app's perspective, it
> may be seen as a jitter.  So the app may be very slow to respond
> to congestion events.  And the app may need to second guess
> the congestion window used in TCP, as this will affect the
> sending rate.  I suspect that it will be quite tricky to do it
> right at the app level.

Well, we're talking about bulk data transfers.  Congestion will be
noticeable.  I'm more concerned about detecting when congestion is
resolved.  Also, the application will be able to measure both, RTTs and
actual bandwidth for the connection.

Yes, getting this right will be tricky.  It doesn't help that we have
two layers of flow control.  But perhaps your comments about TCP buffer
sizing limitations is actually a boon in disguise: just don't auto-tune
TCP and start with very large TCP buffer sizes but small SSHv2 channel
windows and slow start those.

> > (b) in particular is a strong argument for auto-tuning TCP in ssh/sshd.
> 
> How does the current implementation of the flow control work?
> Does it partition the socket buffer into equal sizes for all
> the channels?  Or does it dynamically change that?

The implementation is real dumb: fixed window sizes without relation to
TCP buffer sizes.  (Actually the window size shrinks when the sender
sends data and grows when the receiver drains it, but it never exceeds
the original.)  See $SRC/cmd/ssh/libssh/common/channels.c, and search
for "adjust" case-insensitively -- it's pretty obvious.

The SSHv2 spec covering this (RFC4254) allows the channel window size to
grow, and it would be silly to over-subscribe the connection's buffers
for long.  Each channel has an initial window size.  Sending data
consumes space from the window.  The receiver can send an unsigned
integer adjustment whenever it wants.

Nico
-- 
_______________________________________________
networking-discuss mailing list
[email protected]

Re: [networking-discuss] Improving HPC network performance, specifically SSH/SCP/SFTP perf

Reply via email to