I tracked the problem down. Depending on how you look at it, it is a bug in openssh's sftp or in sshfs.
The problem is that when sending a file data, sshfs will just keep sending without waiting for the server to catch up. This does not need to be a problem, as the sftp server should just stop receiving TCP packets when the buffers are full, causing TCP resends and automatically slowing down the sending. However, the openssh sftp server loop (openssh:sftp-server.c) will just keep trying to read from stdin. It will read from stdin to its input buffer at most once per loop, and dispatch at most one sftp packet per loop. But as each read from stdin can read up to 8 sftp packets into the input buffer then the input buffer risks running out of space. When it runs out of space then it just kills itself (openssh:buffer.c:buffer_append_space). I note that the openssh sftp client has a mechanism to limit the number of unanswered requests, which probably means unlike sshfs, it is not affected. I found that a reliable way to trigger this bug was to start a parallel process which hogs the disk, causing the consuming sftp server loop to slow down. -sshfs-mount localhost:/tmp test -ionice -c1 cp large_file /tmp/deleteme -dd if=/dev/zero of=test/deleteme2 The error I get is dd: writing to `test/deleteme2': Input/output error dd: closing output file `test/deleteme2': Transport endpoint is not connected It is somewhat a matter of opinion, but I would say that it is openssh which should be fixed. If I look at http://www.openssh.org/txt/draft-ietf-secsh-filexfer-02.txt (which, however, does not seem to be the protocol actually implemented by openssh sftp AFAICT), it says that There is no limit on the number of outstanding (non-acknowledged) requests that the client may send to the server. In practice this is limited by the buffering available on the data stream and the queuing performed by the server. If the server's queues are full, it should not read any more data from the stream, and flow control will prevent the client from sending more requests. Note, however, that while there is no restriction on the protocol level, the client's API may provide a limit in order to prevent infinite queuing of outgoing requests at the client. I have implemented a fix to conform to this, which applies to openssh_4.3p2-8 . I will submit that fix elsewhere, but I include it here for convenience. Regards, Thue diff -ur lala2/openssh-4.3p2/buffer.c lala/openssh-4.3p2/buffer.c --- lala2/openssh-4.3p2/buffer.c 2005-03-14 13:22:26.000000000 +0100 +++ lala/openssh-4.3p2/buffer.c 2007-02-18 16:02:41.000000000 +0100 @@ -66,6 +66,17 @@ memcpy(p, data, len); } +/* Shuffle data to the start of the buffer. */ + +static void +buffer_defragment(Buffer *buffer) +{ + memmove(buffer->buf, buffer->buf + buffer->offset, + buffer->end - buffer->offset); + buffer->end -= buffer->offset; + buffer->offset = 0; +} + /* * Appends space to the buffer, expanding the buffer if necessary. This does * not actually copy the data into the buffer, but instead returns a pointer @@ -98,18 +109,21 @@ * data to the beginning and retry. */ if (buffer->offset > MIN(buffer->alloc, BUFFER_MAX_CHUNK)) { - memmove(buffer->buf, buffer->buf + buffer->offset, - buffer->end - buffer->offset); - buffer->end -= buffer->offset; - buffer->offset = 0; + buffer_defragment(buffer); goto restart; } - /* Increase the size of the buffer and retry. */ + /* Increase the size of the buffer and retry. */ newlen = buffer->alloc + len + 32768; - if (newlen > BUFFER_MAX_LEN) - fatal("buffer_append_space: alloc %u not supported", - newlen); + if (newlen > BUFFER_MAX_LEN) { + if (buffer->offset > 0) { + buffer_defragment(buffer); + goto restart; + } else { + fatal("buffer_append_space: alloc %u not supported", + newlen); + } + } buffer->buf = xrealloc(buffer->buf, newlen); buffer->alloc = newlen; goto restart; @@ -124,6 +138,14 @@ return buffer->end - buffer->offset; } +/* The maximum potential space left in buffer. */ + +u_int +buffer_potential_free_space(Buffer *buffer) +{ + return BUFFER_MAX_LEN - buffer_len(buffer); +} + /* Gets data from the beginning of the buffer. */ int diff -ur lala2/openssh-4.3p2/buffer.h lala/openssh-4.3p2/buffer.h --- lala2/openssh-4.3p2/buffer.h 2005-03-14 13:22:26.000000000 +0100 +++ lala/openssh-4.3p2/buffer.h 2007-02-18 16:03:21.000000000 +0100 @@ -31,6 +31,7 @@ void buffer_free(Buffer *); u_int buffer_len(Buffer *); +u_int buffer_potential_free_space(Buffer *); void *buffer_ptr(Buffer *); void buffer_append(Buffer *, const void *, u_int); diff -ur lala2/openssh-4.3p2/sftp-server.c lala/openssh-4.3p2/sftp-server.c --- lala2/openssh-4.3p2/sftp-server.c 2006-01-02 13:40:51.000000000 +0100 +++ lala/openssh-4.3p2/sftp-server.c 2007-02-18 16:22:05.000000000 +0100 @@ -1074,7 +1074,10 @@ memset(rset, 0, set_size); memset(wset, 0, set_size); - FD_SET(in, rset); + /* If the oqueue is close to full then we want to wait on just the output. */ + if (buffer_potential_free_space(&oqueue) > SFTP_MAX_MSG_LENGTH + 4) + FD_SET(in, rset); + olen = buffer_len(&oqueue); if (olen > 0) FD_SET(out, wset); @@ -1086,7 +1089,8 @@ } /* copy stdin to iqueue */ - if (FD_ISSET(in, rset)) { + if (buffer_potential_free_space(&iqueue) > SFTP_MAX_MSG_LENGTH + 4 && + FD_ISSET(in, rset)) { char buf[4*4096]; len = read(in, buf, sizeof buf); if (len == 0) { @@ -1109,7 +1113,10 @@ buffer_consume(&oqueue, len); } } - /* process requests from client */ - process(); + /* process requests from client. If the output buffer + * is critical then don't create more data by + * processing more requests. */ + if (buffer_potential_free_space(&oqueue) > SFTP_MAX_MSG_LENGTH + 4) + process(); } } -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

