nfsantos commented on code in PR #886:
URL: https://github.com/apache/jackrabbit-oak/pull/886#discussion_r1154621454
##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/Downloader.java:
##########
@@ -147,14 +163,49 @@ public ItemResponse call() throws Exception {
sourceUrl.setConnectTimeout(Downloader.this.connectTimeoutMs);
sourceUrl.setReadTimeout(Downloader.this.readTimeoutMs);
+ // Updating a MessageDigest from multiple threads is not thread
safe, so we cannot reuse a single instance.
+ // Creating a new instance is a lightweight operation, no need to
increase complexity by creating a pool.
+ MessageDigest md = null;
+ if (Downloader.this.checksumAlgorithm != null && item.checksum !=
null) {
+ md =
MessageDigest.getInstance(Downloader.this.checksumAlgorithm);
+ }
+
Path destinationPath = Paths.get(item.destination);
Files.createDirectories(destinationPath.getParent());
- long size;
+ long size = 0;
try (ReadableByteChannel byteChannel =
Channels.newChannel(sourceUrl.getInputStream());
FileOutputStream outputStream = new
FileOutputStream(destinationPath.toFile())) {
- size = outputStream.getChannel()
- .transferFrom(byteChannel, 0, Long.MAX_VALUE);
+ ByteBuffer buffer = ByteBuffer.allocate(1024);
+ int bytesRead;
+ while ((bytesRead = byteChannel.read(buffer)) != -1) {
+ buffer.flip();
+ if (md != null) {
+ md.update(buffer);
+ }
+ size += bytesRead;
+ outputStream.getChannel().write(buffer);
+ buffer.clear();
+ }
Review Comment:
I'm wondering if the performance of doing the transfer with buffers in the
Java heap will not be significantly worse than using `transferFrom`, which in
principle can use low-level mechanisms of the JVM/OS. In ideal conditions, the
transferFrom could even be zero copy. Maybe it would be faster to still use
`transferFrom` and then compute the checksum after the file is written? Or
maybe the transferFrom is not much better than copying in the heap or we are
limited by the network transfer and it does not really matter.
The buffer of 1024 seems small, could hurt performance. I guess the buffer
here should be large enough to read a large chunk of the TPC receive buffer.
I'm not sure what is the default size, but it is for sure larger than 1024
bytes. I found this which suggests that the buffer is close to 80KB. Maybe here
it is better to increase to 16KB or even 32KB, to reduce the CPU overhead.
https://man7.org/linux/man-pages/man7/tcp.7.html
> tcp_rmem (since Linux 2.4)
> default
> the default size of the receive buffer for a TCP
> socket. This value overwrites the initial default
> buffer size from the generic global
> net.core.rmem_default defined for all protocols.
> The default value is 87380 bytes. (On Linux 2.4,
> this will be lowered to 43689 in low-memory
> systems.) If larger receive buffer sizes are
> desired, this value should be increased (to affect
> all sockets). To employ large TCP windows, the
> net.ipv4.tcp_window_scaling must be enabled
> (default).
##########
oak-run/src/main/java/org/apache/jackrabbit/oak/run/Downloader.java:
##########
@@ -147,14 +163,49 @@ public ItemResponse call() throws Exception {
sourceUrl.setConnectTimeout(Downloader.this.connectTimeoutMs);
sourceUrl.setReadTimeout(Downloader.this.readTimeoutMs);
+ // Updating a MessageDigest from multiple threads is not thread
safe, so we cannot reuse a single instance.
+ // Creating a new instance is a lightweight operation, no need to
increase complexity by creating a pool.
+ MessageDigest md = null;
+ if (Downloader.this.checksumAlgorithm != null && item.checksum !=
null) {
+ md =
MessageDigest.getInstance(Downloader.this.checksumAlgorithm);
+ }
+
Path destinationPath = Paths.get(item.destination);
Files.createDirectories(destinationPath.getParent());
- long size;
+ long size = 0;
try (ReadableByteChannel byteChannel =
Channels.newChannel(sourceUrl.getInputStream());
FileOutputStream outputStream = new
FileOutputStream(destinationPath.toFile())) {
Review Comment:
Build a FileChannel directly instead of getting the file channel from an
output stream. Something like this (not tested):
```
SeekableByteChannel outputStream = Files.newByteChannel(destinationPath,
StandardOpenOption.WRITE, StandardOpenOption.CREATE_NEW)
```
Or else just use streams so we do not mix channels and streams.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]