bruno-roustant commented on a change in pull request #68:
URL: https://github.com/apache/solr/pull/68#discussion_r613062024
##########
File path:
solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobPusher.java
##########
@@ -17,36 +17,64 @@
package org.apache.solr.blob;
+import org.apache.lucene.util.IOUtils;
+import org.apache.solr.common.util.ExecutorUtil;
+import org.apache.solr.common.util.SolrNamedThreadFactory;
+import org.apache.solr.core.backup.repository.BackupRepository;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
import java.io.Closeable;
import java.io.IOException;
import java.io.InputStream;
+import java.io.OutputStream;
import java.lang.invoke.MethodHandles;
+import java.net.URI;
import java.util.Collection;
import java.util.List;
+import java.util.Objects;
import java.util.concurrent.*;
+import java.util.regex.Pattern;
import java.util.stream.Collectors;
-import org.apache.lucene.util.IOUtils;
-import org.apache.solr.common.util.ExecutorUtil;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/** Pushes a set of files to Blob, and works with listings. */
+/**
+ * Pushes a set of files to Blob, and works with listings.
+ */
public class BlobPusher implements Closeable {
private static final Logger log =
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
- // WORK IN PROGRESS!!
+ private static final String PATH_SEPARATOR = "/";
+ private static final Pattern PATH_SPLITTER = Pattern.compile(PATH_SEPARATOR);
- private final BlobStore blobStore;
- private final ExecutorService executor =
ExecutorUtil.newMDCAwareCachedThreadPool("blobPusher");
+ // WORK IN PROGRESS!!
- public BlobPusher(BlobStore blobStore) {
- this.blobStore = blobStore;
+ private final BackupRepository backupRepository;
+ private final URI backupLocation;
+ private final ExecutorService executor;
+ // Pool of stream buffers, one per executor thread, to reuse them as the
buffer size may be large
+ // to have efficient stream transfer to the remote backup repository.
+ private final ThreadLocal<byte[]> streamBuffers;
Review comment:
I think it is worth it, and there is not much trouble, the code is
short. By looking at the S3 BackupRepository implementation, I noticed that
stream buffers are huge, 16 MB. I don't know if they really benchmarked for the
optimal size, but it seems sensible that a large buffer is required to amortize
the latency to call a remote repository. Here the goal is to avoid creating
large buffers again and again.
I've already used this technique years ago in a server, and for large
buffers it really made the difference in performance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]