tomaswolf commented on issue #854:
URL: https://github.com/apache/mina-sshd/issues/854#issuecomment-3575522132
This way of downloading a file is always going to be slow.
First, a buffer of 10MB is not going to help because servers typically don't
return that much data in a single read request. More likely you'll be getting
much less per read request; normally less than 256kB and typically 32kB or
maybe 64kB. Servers do this to guard against out-of-memory conditions under
heavy load.
Second, this code will fire off a read request and then wait for the
response to arrive before it sends the next read request. So it incurs the full
network latency on each request.
Third, although I don't see any multi-threading in this code: trying to
download a single file using multiple threads is unlikely to give any speed-up.
First, all requests and data are sent over a single network connection, so some
serialization will occur at the SSH level anyway. And second: let's assume you
have one thread downloading the block from file offset 0 to 4'999'999, another
thread downloading the block from 5'000'000 to 9'999'999 and a third thread
downloading 10'000'000 to 14'999'999. When these threads receive data, they'd
have to write it at the correct offsets in the local file. So you need random
access on the local file, and you'd be jumping around, frequently resetting
file offsets. That is going to kill performance when writing to the local file.
On the server side; if the read requests end up operating on the same
handle/file object, such re-positioning may also occur, making things even
worse.
I suggest you use something like
```
try (InputStream in = sftpClient.read(filename)) {
Files.copy(in, file, StandardCopyOption.REPLACE_EXISTING);
}
```
to download a file. This
* uses a reasonable buffer size internally,
* sends off multiple read requests and handles the responses when they come
in, amortizing network latency,
* writes the local file sequentially, avoiding performance problems with
file positioning in the local file.
You might also want to look at the upload tests in project
`sshd-benchmarks`, these show various ways to upload a file, but similar
different ways to download files also exist.
It might make sense to download multiple files using multiple threads (one
thread per file); this might perhaps give a small speed improvement if network
operations can be overlapped with local file writing in other threads, but then
again perhaps it might not bring much because in the end it's still a single
network connection. It will in any case complicate thread synchronization
(waiting until all downloads are done; error handling if some downloads fail).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]