Re: Pipelining and recent sftp upload improvements

Will Cosgrove Mon, 29 Nov 2010 11:05:24 -0800

Hi Daniel (alt all),
The pipelining API I created has two calls, one for the initial 
sending/requesting data and one for the ack/response of data.  It's modeled 
loosely on the API in libssh and code found in openssh.   The API looks like 
this:

LIBSSH2_API ssize_t 
libssh2_sftp_write_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t 
offset,
                                                                   char 
*buffer, size_t buffer_len, unsigned long *request_id);

LIBSSH2_API int 
libssh2_sftp_write_async(LIBSSH2_SFTP_HANDLE *handle, unsigned long request_id);

LIBSSH2_API int 
libssh2_sftp_read_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t 
offset, size_t buffer_maxlen, unsigned long *out_request_id);

LIBSSH2_API int 
libssh2_sftp_read_async(LIBSSH2_SFTP_HANDLE *handle, char *buffer, size_t 
buf_len, unsigned long request_id);

For uploading, the write_async_begin call writes the buffer of a given size.  
This is not guaranteed, so the ack call write_async returns the actual size of 
the buffer handled.  It is then up to the caller to correctly re-send the 
remaining buffer at the correct offset.  Writes are paired using the request ID.

For downloading, it's the same concept.  read_async_begin requests bytes of a 
given size at a given offset, read_async then actually returns that data; this 
may not be the entire requested bytes, so the caller needs to correctly 
re-request the remaining buffer at the correct offset.  Reads are paired using 
the request ID.

The advantages of this is that you can call write_async_begin say, 10 times in 
a row, then starting draining the acks using write_async, this minimizes 
network latency (aka sitting on a select() call) and does a better job maxing 
out available bandwidth.  The same can be said about downloading.  The 
advantages of this method to the new write pipelining on the 1.2.8 branch is 
that you don't have to pre-read a large buffer of data into memory.  The 
disadvantage is it's more leg-work for the implementor because you have to 
track the offsets and drain replies manually.  Thinking out loud, it might be 
worth adding a convenance API that takes a file path and does all this 
'behind-the-scences' like the openssh do_upload/do_download methods do.

Now for the ever-so-important speed improvements.  I'm testing against an 
internal server (RAID'ed) on a gigabit ethernet.  My before benchmarks were 
about 12 MB/sec upload using 1.2.7 stable release.  My after is about 53 MB/sec 
upload.   When shelling out to openssh's sftp, I can get about 73 MB/sec.  
Ideally I'd like to get libssh up to ssh's performance levels, but that's for 
another day.

Cheers,
Will

On Nov 29, 2010, at 5:09 AM, Daniel Stenberg wrote:

> On Wed, 17 Nov 2010, Will Cosgrove wrote:
> 
>> The reason I'm posting is I recently added my own upload/download pipelining 
>> API to libssh2 modeled after libssh's download pipeline API.  It pushes a 
>> bit more state management onto the user, but it seems to work fairly well 
>> and doesn't require a large input buffer to be filled before hand.  I was 
>> wondering if there was any interest in my additions (which admittedly need 
>> to be code-reviewed by someone more familiar to libssh2 than me) or if this 
>> current method was going to be applied to downloading at some point and I 
>> should just keep my changes to my projects.
> 
> I might be interested. I have no idea how that pipeline API works so you'd 
> have to explain that, and then I would also like to see some numbers or 
> metrics that show it making a difference!
> 
> -- 
> 
> / daniel.haxx.se
> _______________________________________________
> libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel

_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel

Re: Pipelining and recent sftp upload improvements

Reply via email to