[ 
https://issues.apache.org/jira/browse/WAGON-537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16664245#comment-16664245
 ] 

ASF GitHub Bot commented on WAGON-537:
--------------------------------------

olaf-otto commented on a change in pull request #50: WAGON-537 Maven download 
speed of large artifacts is slow
URL: https://github.com/apache/maven-wagon/pull/50#discussion_r228322159
 
 

 ##########
 File path: 
wagon-provider-api/src/main/java/org/apache/maven/wagon/AbstractWagon.java
 ##########
 @@ -560,31 +564,78 @@ protected void transfer( Resource resource, InputStream 
input, OutputStream outp
     protected void transfer( Resource resource, InputStream input, 
OutputStream output, int requestType, long maxSize )
         throws IOException
     {
-        byte[] buffer = new byte[DEFAULT_BUFFER_SIZE];
+        byte[] buffer = bufferForTransferring( resource );
 
         TransferEvent transferEvent = new TransferEvent( this, resource, 
TransferEvent.TRANSFER_PROGRESS, requestType );
         transferEvent.setTimestamp( System.currentTimeMillis() );
 
         long remaining = maxSize;
         while ( remaining > 0 )
         {
-            // let's safely cast to int because the min value will be lower 
than the buffer size.
-            int n = input.read( buffer, 0, (int) Math.min( buffer.length, 
remaining ) );
+            // Read from the stream, block if necessary until either EOF or 
buffer is filled.
+            // Filling the buffer has priority since downstream processors 
will significantly degrade i/o
+            // performance if called to frequently (large data streams) as 
they perform expensive tasks such as
+            // console output or data integrity checks.
+            int nextByte = input.read();
 
-            if ( n == -1 )
+            if ( nextByte == -1 )
             {
                 break;
             }
 
-            fireTransferProgress( transferEvent, buffer, n );
+            buffer[0] = ( byte ) nextByte;
+
+            // let's safely cast to int because the min value will be lower 
than the buffer size.
+            int length = (int) min( buffer.length, remaining ),
+                read = 1;
+
+            for ( ; read < length ; ++read )
+            {
+                nextByte = input.read();
+                if ( nextByte == -1 )
+                {
+                    break;
+                }
+                buffer[read] = ( byte ) nextByte;
+            }
+
+            fireTransferProgress( transferEvent, buffer, read );
 
 Review comment:
   Hi @michael-o 
   
   I am going to retract htis pull request to change as I am not satisfied with 
the solution yet. As it happens, the same issue exist when uploading large 
files (100+ MB) using maven. Thee, however, buffering has a different effect as 
the underlying stream usually stems from a file.
   
   I have decided that I will address both issues. The best course of action is 
thus to change the scope of WAGON-537 and adopt a more generic solution (using 
nio) that will fix the issue for both cases. I will open up a ne wpoull request 
asap.
   
   Thanks again for your review!
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> Maven download speed of large artifacts is slow due to unsuitable buffer 
> strategy for remote Artifacts in AbstractWagon
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: WAGON-537
>                 URL: https://issues.apache.org/jira/browse/WAGON-537
>             Project: Maven Wagon
>          Issue Type: Improvement
>          Components: wagon-provider-api
>    Affects Versions: 3.2.0
>         Environment: Windows 10, JDK 1.8, Nexus  Artifact store > 100MB/s 
> network connection.
>            Reporter: Olaf Otto
>            Assignee: Michael Osipov
>            Priority: Major
>              Labels: perfomance
>         Attachments: wagon-issue.png
>
>
> We are using maven for build process automation with docker. This sometimes 
> involves downloading images with a few gigabytes in size. Here, maven's 
> download speed is consistently and reproducibly slow. For instance, an 
> artifact with 7,5 GB in size took almost two hours to transfer in spite of a 
> 100 MB/s connection with respective reproducible download speed from the 
> remote nexus artifact repository when using a browser to download.
> I have investigated the issue using JProfiler. The result clearly shows a 
> significant issue in AbstractWagon's transfer( Resource resource, InputStream 
> input, OutputStream output, int requestType, long maxSize ) method used for 
> remote artifacts.
> Here, the input stream is read in a loop using a 4 Kb buffer. Whenever data 
> is received, the received data is pushed to downstream listeners via 
> fireTransferProgress. These listeners (or rather consumers) perform  
> expensive tasks such as checksumming or printing to console.
> Now, the underlying InputStream implementation used in transfer will return 
> calls to read(bugger, offset, length) as soon as *some* data is available. 
> That is, fireTransferProgress is invoked with an average number of bytes less 
> than half the buffer capacity (this varies with the underlying network and 
> hardware architecture). Consequently, fireTransferProgress is invoked 
> *millions of times* for large files. As this is a blocking operation, the 
> time spent in fireTransferProgress dominates and drastically slows down the 
> transfer by at least one order of magnitude. 
> !wagon-issue.png! 
> In our case, we found download speed reduced from a theoretical optimum of 
> ~80 seconds to to more than 3200 seconds.
> From an architectural perspective, I would not want to make the consumers / 
> listeners invoked via fireTransferProgress aware of their potential impact on 
> download speed, but rather refactor the transfer method such that it uses a 
> buffer strategy reducing the the number of fireTransferProgress invocations. 
> This should be done with regard to the expected file size of the transfer, 
> such that fireTransferProgress is invoked often enough but not to frequent.
> I have implemented a solution and transfer speed went up more than one order 
> of magnitude. I will provide a pull request asap.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to