[GitHub] [solr] bruno-roustant commented on a change in pull request #68: Use BackupRepository instead of BlobStore.

GitBox Wed, 14 Apr 2021 01:52:26 -0700


bruno-roustant commented on a change in pull request #68:
URL: https://github.com/apache/solr/pull/68#discussion_r613062024




##########
File path: 
solr/contrib/blob-directory/src/java/org/apache/solr/blob/BlobPusher.java
##########
@@ -17,36 +17,64 @@
 
 package org.apache.solr.blob;
 
+import org.apache.lucene.util.IOUtils;
+import org.apache.solr.common.util.ExecutorUtil;
+import org.apache.solr.common.util.SolrNamedThreadFactory;
+import org.apache.solr.core.backup.repository.BackupRepository;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import java.io.Closeable;
 import java.io.IOException;
 import java.io.InputStream;
+import java.io.OutputStream;
 import java.lang.invoke.MethodHandles;
+import java.net.URI;
 import java.util.Collection;
 import java.util.List;
+import java.util.Objects;
 import java.util.concurrent.*;
+import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 
-import org.apache.lucene.util.IOUtils;
-import org.apache.solr.common.util.ExecutorUtil;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-/** Pushes a set of files to Blob, and works with listings. */
+/**
+ * Pushes a set of files to Blob, and works with listings.
+ */
 public class BlobPusher implements Closeable {
 
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
-  // WORK IN PROGRESS!!
+  private static final String PATH_SEPARATOR = "/";
+  private static final Pattern PATH_SPLITTER = Pattern.compile(PATH_SEPARATOR);
 
-  private final BlobStore blobStore;
-  private final ExecutorService executor = 
ExecutorUtil.newMDCAwareCachedThreadPool("blobPusher");
+  // WORK IN PROGRESS!!
 
-  public BlobPusher(BlobStore blobStore) {
-    this.blobStore = blobStore;
+  private final BackupRepository backupRepository;
+  private final URI backupLocation;
+  private final ExecutorService executor;
+  // Pool of stream buffers, one per executor thread, to reuse them as the 
buffer size may be large
+  // to have efficient stream transfer to the remote backup repository.
+  private final ThreadLocal<byte[]> streamBuffers;

Review comment:
       I think it is worth it, and there is not much trouble, the code is 
short. By looking at the S3 BackupRepository implementation, I noticed that 
stream buffers are huge, 16 MB. I don't know if they really benchmarked for the 
optimal size, but it seems sensible that a large buffer is required to amortize 
the latency to call a remote repository. Here the goal is to avoid creating 
large buffers again and again.
   I've already used this technique years ago in a server, and for large 
buffers it really made the difference in performance.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [solr] bruno-roustant commented on a change in pull request #68: Use BackupRepository instead of BlobStore.

Reply via email to