olaf-otto commented on a change in pull request #50: WAGON-537 Maven download
speed of large artifacts is slow
URL: https://github.com/apache/maven-wagon/pull/50#discussion_r225110395
##########
File path:
wagon-provider-api/src/main/java/org/apache/maven/wagon/AbstractWagon.java
##########
@@ -560,31 +564,78 @@ protected void transfer( Resource resource, InputStream
input, OutputStream outp
protected void transfer( Resource resource, InputStream input,
OutputStream output, int requestType, long maxSize )
throws IOException
{
- byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
+ byte[] buffer = bufferForTransferring( resource );
TransferEvent transferEvent = new TransferEvent( this, resource,
TransferEvent.TRANSFER_PROGRESS, requestType );
transferEvent.setTimestamp( System.currentTimeMillis() );
long remaining = maxSize;
while ( remaining > 0 )
{
- // let's safely cast to int because the min value will be lower
than the buffer size.
- int n = input.read( buffer, 0, (int) Math.min( buffer.length,
remaining ) );
+ // Read from the stream, block if necessary until either EOF or
buffer is filled.
+ // Filling the buffer has priority since downstream processors
will significantly degrade i/o
+ // performance if called to frequently (large data streams) as
they perform expensive tasks such as
+ // console output or data integrity checks.
+ int nextByte = input.read();
- if ( n == -1 )
+ if ( nextByte == -1 )
{
break;
}
- fireTransferProgress( transferEvent, buffer, n );
+ buffer[0] = ( byte ) nextByte;
+
+ // let's safely cast to int because the min value will be lower
than the buffer size.
+ int length = (int) min( buffer.length, remaining ),
+ read = 1;
+
+ for ( ; read < length ; ++read )
+ {
+ nextByte = input.read();
+ if ( nextByte == -1 )
+ {
+ break;
+ }
+ buffer[read] = ( byte ) nextByte;
+ }
+
+ fireTransferProgress( transferEvent, buffer, read );
- output.write( buffer, 0, n );
+ output.write( buffer, 0, read );
- remaining -= n;
+ remaining -= read;
}
output.flush();
}
+ /**
+ * Provide a buffer suitably sized for efficiently
+ * {@link #transfer(Resource, InputStream, OutputStream, int, long)
transferring}
+ * the given {@link Resource}. For larger files, larger buffers are
provided such that downstream
+ * {@link #fireTransferProgress(TransferEvent, byte[], int) listeners} are
not notified overly frequently.
+ * For instance, transferring gigabyte-sized resources would result in
millions of notifications when using
+ * only a few kilobytes of buffer, drastically slowing transfer since
transfer progress listeners and
+ * notifications are synchronous and may block, e.g. when writing download
progress status to console.
+ *
+ * @param resource must not be null.
+ * @return a byte buffer suitable for the {@link
Resource#getContentLength() content length} of the resource.
+ */
+ protected byte[] bufferForTransferring( Resource resource )
+ {
+ long contentLength = resource.getContentLength();
+
+ if ( contentLength <= 0 )
+ {
+ return new byte[DEFAULT_BUFFER_SIZE];
+ }
+
+ int numberOf4KbSegmentsFor100Chunks = ( ( int ) ( contentLength / ( 4
* 1024 * 100 ) ) );
Review comment:
Kindly asking to specifically review the way the buffer is sized here.
The Idea is that at least the old buffer size (DEFAULT_BUFFER_SIZE) is used
and that buffers should have a size of N * 4Kb. As a rule of thumb, the buffer
should be sized such that at least 100 chunks of data are being processed. I've
capped the buffer at MAXIMUM_BUFFER_SIZE (512 Kb).
Resulting, most of the files transferred (Metadata such as POMs and
checksums, small JARs...) are transferred with the old buffer size. However,
larger files are transferred with up to 512 Kb of buffer.
For a 10GB stream, this will reduce the number of chunks from > 5 million
(the previous strategy almost never used the entire 4k buffer but less than
half) to about 20 thousand chunks.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services