murblanc commented on a change in pull request #1055: SOLR-13932 Review 
directory locking and Blob interactions
URL: https://github.com/apache/lucene-solr/pull/1055#discussion_r362517551
 
 

 ##########
 File path: 
solr/core/src/java/org/apache/solr/store/blob/metadata/ServerSideMetadata.java
 ##########
 @@ -132,35 +127,15 @@ public ServerSideMetadata(String coreName, CoreContainer 
container, boolean take
         generation = latestCommit.getGeneration();
         latestCommitFiles = latestCommitBuilder.build();
 
-        // Capture now the hash and verify again if we need to pull content 
from the Blob store into this directory,
-        // to make sure there are no local changes at the same time that might 
lead to a corruption in case of interaction
-        // with the download.
-        // TODO: revise with "design assumptions around pull pipeline" 
mentioned in allCommits TODO below
+        // Capture now the hash and verify again after files have been pulled 
and before the directory is updated (or before
+        // the index is switched to use a new directory) to make sure there 
are no local changes at the same time that might
+        // lead to a corruption in case of interaction with the download or 
might be a sign of other problems (it is not
+        // expected that indexing can happen on a local directory of a SHARED 
replica if that replica is not up to date with
+        // the Blob store version).
         directoryHash = getSolrDirectoryHash(coreDir);
 
-        allCommitsFiles = latestCommitFiles;
-        // TODO: allCommits was added to detect special cases where inactive 
file segments can potentially conflict
-        //       with whats in shared store. But given the recent 
understanding of semantics around index directory locks
-        //       we need to revise our design assumptions around pull 
pipeline, including this one.
-        //       Disabling this for now so that unreliability around 
introspection of older commits 
-        //       might not get in the way of steady state indexing.
-//        // A note on listCommits says that it does not guarantee consistent 
results if a commit is in progress.
-//        // But in blob context we serialize commits and pulls by proper 
locking therefore we should be good here.
-//        List<IndexCommit> allCommits = DirectoryReader.listCommits(coreDir);
-//
-//        // we should always have a commit point as verified in the beginning 
of this method.
-//        assert (allCommits.size() > 1) || (allCommits.size() == 1 && 
allCommits.get(0).equals(latestCommit));
-//
-//        // optimization:  normally we would only be dealing with one commit 
point. In that case just reuse latest commit files builder.
-//        ImmutableCollection.Builder<CoreFileData> allCommitsBuilder = 
latestCommitBuilder;
-//        if (allCommits.size() > 1) {
-//          allCommitsBuilder = new ImmutableSet.Builder<>();
-//          for (IndexCommit commit : allCommits) {
-//            // no snapshot for inactive segments files
-//            buildCommitFiles(coreDir, commit, allCommitsBuilder, /* 
snapshotDir */ null);
-//          }
-//        }
-//        allCommitsFiles = allCommitsBuilder.build();
+        // Need to inventory all local files in case files that need to be 
pulled from Blob conflict with them.
+        allFiles = ImmutableSet.copyOf(coreDir.listAll());
 
 Review comment:
   You're right on both counts. In order to limit complexity at this stage, I'm 
tempted to leave these optimizations for later (likely very minor improvement 
for indexing given the higher cost of pushing to Blob store).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to