Hi Daniel (alt all),
The pipelining API I created has two calls, one for the initial
sending/requesting data and one for the ack/response of data. It's modeled
loosely on the API in libssh and code found in openssh. The API looks like
this:
LIBSSH2_API ssize_t
libssh2_sftp_write_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t
offset,
char
*buffer, size_t buffer_len, unsigned long *request_id);
LIBSSH2_API int
libssh2_sftp_write_async(LIBSSH2_SFTP_HANDLE *handle, unsigned long request_id);
LIBSSH2_API int
libssh2_sftp_read_async_begin(LIBSSH2_SFTP_HANDLE *handle, libssh2_uint64_t
offset, size_t buffer_maxlen, unsigned long *out_request_id);
LIBSSH2_API int
libssh2_sftp_read_async(LIBSSH2_SFTP_HANDLE *handle, char *buffer, size_t
buf_len, unsigned long request_id);
For uploading, the write_async_begin call writes the buffer of a given size.
This is not guaranteed, so the ack call write_async returns the actual size of
the buffer handled. It is then up to the caller to correctly re-send the
remaining buffer at the correct offset. Writes are paired using the request ID.
For downloading, it's the same concept. read_async_begin requests bytes of a
given size at a given offset, read_async then actually returns that data; this
may not be the entire requested bytes, so the caller needs to correctly
re-request the remaining buffer at the correct offset. Reads are paired using
the request ID.
The advantages of this is that you can call write_async_begin say, 10 times in
a row, then starting draining the acks using write_async, this minimizes
network latency (aka sitting on a select() call) and does a better job maxing
out available bandwidth. The same can be said about downloading. The
advantages of this method to the new write pipelining on the 1.2.8 branch is
that you don't have to pre-read a large buffer of data into memory. The
disadvantage is it's more leg-work for the implementor because you have to
track the offsets and drain replies manually. Thinking out loud, it might be
worth adding a convenance API that takes a file path and does all this
'behind-the-scences' like the openssh do_upload/do_download methods do.
Now for the ever-so-important speed improvements. I'm testing against an
internal server (RAID'ed) on a gigabit ethernet. My before benchmarks were
about 12 MB/sec upload using 1.2.7 stable release. My after is about 53 MB/sec
upload. When shelling out to openssh's sftp, I can get about 73 MB/sec.
Ideally I'd like to get libssh up to ssh's performance levels, but that's for
another day.
Cheers,
Will
On Nov 29, 2010, at 5:09 AM, Daniel Stenberg wrote:
> On Wed, 17 Nov 2010, Will Cosgrove wrote:
>
>> The reason I'm posting is I recently added my own upload/download pipelining
>> API to libssh2 modeled after libssh's download pipeline API. It pushes a
>> bit more state management onto the user, but it seems to work fairly well
>> and doesn't require a large input buffer to be filled before hand. I was
>> wondering if there was any interest in my additions (which admittedly need
>> to be code-reviewed by someone more familiar to libssh2 than me) or if this
>> current method was going to be applied to downloading at some point and I
>> should just keep my changes to my projects.
>
> I might be interested. I have no idea how that pipeline API works so you'd
> have to explain that, and then I would also like to see some numbers or
> metrics that show it making a difference!
>
> --
>
> / daniel.haxx.se
> _______________________________________________
> libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel