On Wed, 10 Nov 2010, Mark Riordan wrote:

My upload test from Windows to Linux now works!

Oh. It proves we are at least still making progress...

The 19 MB debug log has 149009 "Would block" messages, which seems
excessive - one for every 4 bytes.

(Once for every 4 byte of _your_ payload data, yes, but SFTP, SSH channels and SSH transport will add a notiacble amount of data to send.)

It certainly seems a bit excessive yes. I know that we do a few too many "try this just in case" calls so that there can be a few too many per invoke of a the sftp_write function, but it does give me the feeling that we're not doing the waiting for the socket activity correctly.

A detail in the log:

We get excessive -37 logged, but they are all "lumped up" and get done on *the same microsecond*. See for example at the time 0.414451 where 966 lines are reported on the same timestamp. Then there's a 16 millisecond pause, where presumably the code waits for the socket to turn writable again. Then at 0.430076 there's an additional 1320 lines logged at *the same microsecond*.

I figure the GetSystemTimeAsFileTime() function we use for this simply has this low resolution so we get this inexactness in the timings.

My application recorded that calls to libssh2_sftp_write generally resulted in 4000 or 8000 bytes written. Never less than 500.

Right, as you use a 32500 bytes buffer and libssh2 will bite off 4000 bytes at a time from that so the SFTP packets will become 4000*8 + 500*1 (of payload). How many bytes of those that are actually sent off depends on the window size in the SSH channel and how much the TCP layer accepts. You will probably gain an unmeasureably small amount of speed by using even 4000 byte blocks.

I've just now pushed yet another change to the sftp_write code that concerns situations where more data is ACKed already than the buffer size passed in to the function as that would cause major weirdness. I don't think you've seen effects of this, as I believe things would go down the drain hard if so.

Some ideas on how to move further from here...

- Start using a smaller test file. Your problem is probably just as visible with a 100K file. Or perhaps you should even try a much smaller, like a 10K one to see how that behaves.

- Since we're now down to "just" being slow, you can try to add more output logs again to see if they reveal anything.

- We need to figure out where the EAGAINs come from and make sure libssh2 acts correctly on them, but it a bit hard when the logging slows things down this much. Possibly you should make your example use the libssh2_trace_sethandler() function to get the tracing done by your own function, and then you just keep all the logging in memory until the test run is over and then save it to disk as that is probably gonna be A LOT faster although of course quite memory consuming.

--

 / daniel.haxx.se
_______________________________________________
libssh2-devel http://cool.haxx.se/cgi-bin/mailman/listinfo/libssh2-devel

Reply via email to