nfsantos commented on code in PR #886:
URL: https://github.com/apache/jackrabbit-oak/pull/886#discussion_r1156812355


##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/Downloader.java:
##########
@@ -147,14 +163,49 @@ public ItemResponse call() throws Exception {
             sourceUrl.setConnectTimeout(Downloader.this.connectTimeoutMs);
             sourceUrl.setReadTimeout(Downloader.this.readTimeoutMs);
 
+            // Updating a MessageDigest from multiple threads is not thread 
safe, so we cannot reuse a single instance.
+            // Creating a new instance is a lightweight operation, no need to 
increase complexity by creating a pool.
+            MessageDigest md = null;
+            if (Downloader.this.checksumAlgorithm != null && item.checksum != 
null) {
+                md = 
MessageDigest.getInstance(Downloader.this.checksumAlgorithm);
+            }
+
             Path destinationPath = Paths.get(item.destination);
             Files.createDirectories(destinationPath.getParent());
-            long size;
+            long size = 0;
             try (ReadableByteChannel byteChannel = 
Channels.newChannel(sourceUrl.getInputStream());
                  FileOutputStream outputStream = new 
FileOutputStream(destinationPath.toFile())) {
-                size = outputStream.getChannel()
-                        .transferFrom(byteChannel, 0, Long.MAX_VALUE);
+                ByteBuffer buffer = ByteBuffer.allocate(1024);
+                int bytesRead;
+                while ((bytesRead = byteChannel.read(buffer)) != -1) {
+                    buffer.flip();
+                    if (md != null) {
+                        md.update(buffer);
+                    }
+                    size += bytesRead;
+                    outputStream.getChannel().write(buffer);
+                    buffer.clear();
+                }

Review Comment:
   >The buffer-size is now configurable. The default is 8k which should be a 
safe default buffer size to use. It can be changed if needed.
   
   I'd suggest increasing the default to 16KB, which may help decreasing CPU 
usage. The performance seems to be the same between 8 and 16KB buffer, but I'm 
wondering about the CPU usage.
   
   With a buffer size of 8K and a data rate of 382MB/sec, the downloader will 
be doing up to 100K system calls per second (382MB*1024 / 8KB *2, one syscall 
to read and maybe another to write, possibly less than one per write because of 
write buffering in the JVM). That seems like a very high number, and given that 
syscalls have a moderate cost, this could result in unnecessary CPU usage.
   
   If you don't see any drawback to increasing the buffer to 16KB, then this 
could reduce CPU usage, which is always good in a cloud environment. Even with 
1000 download threads, 16KB buffers would be 16MB of RAM, not much.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to