Olaf Otto created WAGON-537:
-------------------------------
Summary: Maven download speed of large artifacts is slow due to
unsuitable buffer strategy for remote Artifacts in AbstractWagon
Key: WAGON-537
URL: https://issues.apache.org/jira/browse/WAGON-537
Project: Maven Wagon
Issue Type: Improvement
Components: wagon-provider-api
Affects Versions: 3.2.0
Environment: Windows 10, JDK 1.8, Nexus Artifact store > 100MB/s
network connection.
Reporter: Olaf Otto
Attachments: wagon-issue.png
We are using maven for build process automation with docker. This sometimes
involves downloading images with a few gigabytes in size. Here, maven's
download speed is consistently and reproducibly slow. For instance, an artifact
with 7,5 GB in size took almost two hours to transfer in spite of a 100 MB/s
connection with respective reproducible download speed from the remote nexus
artifact repository when using a browser to download.
I have investigated the issue using JProfiler. The result clearly shows a
significant issue in AbstractWagon's transfer( Resource resource, InputStream
input, OutputStream output, int requestType, long maxSize ) method used for
remote artifacts.
Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data is
received, the received data is pushed to downstream listeners via
fireTransferProgress. These listeners (or rather consumers) perform expensive
tasks such as checksumming or printing to console.
Now, the underlying InputStream implementation used in transfer will return
calls to read(bugger, offset, length) as soon as *some* data is available. That
is, fireTransferProgress is invoked with an average number of bytes less than
half the buffer capacity (this varies with the underlying network and hardware
architecture). Consequently, fireTransferProgress is invoked *millions of
times* for large files. As this is a blocking operation, the time spent in
fireTransferProgress dominates and drastically slows down the transfer by at
least one order of magnitude.
!wagon-issue.png!
In our case, we found download speed reduced from a theoretical optimum of ~80
seconds to to more than 3200 seconds.
>From an architectural perspective, I would not want to make the consumers /
>listeners invoked via fireTransferProgress aware of their potential impact on
>download speed, but rather refactor the transfer method such that it uses a
>buffer strategy reducing the the number of fireTransferProgress invocations.
>This should be done with regard to the expected file size of the transfer,
>such that fireTransferProgress is invoked often enough but not to frequent.
I have implemented a solution and transfer speed went up more than one order of
magnitude. I will provide a pull request asap.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)