Chris M. Hostetter created SOLR-13908:
-----------------------------------------

             Summary: Possible bugs when using HdfsDirectoryFactory w/ 
softCommit=true + openSearcher=true
                 Key: SOLR-13908
                 URL: https://issues.apache.org/jira/browse/SOLR-13908
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: hdfs
            Reporter: Chris M. Hostetter


While working on SOLR-13872 something caught my eye that seems fishy....

*Background:*

SOLR-4916 introduced the API 
{{DirectoryFactory.searchersReserveCommitPoints()}} -- a method that 
{{SolrIndexSearcher}} uses to decide if it needs to explicitly save/release the 
{{IndexCommit}} point of it's {{DirectoryReader}} with the 
{{IndexDeletionPolicytWrapper}}, for use on Filesystems that don't in some way 
"protect" open files...

{code:title=SolrIndexSearcher}
    if (directoryFactory.searchersReserveCommitPoints()) {
      // reserve commit point for life of searcher
      
core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration());
    }
{code}

{code:title=DirectoryFactory}
  /**
   * If your implementation can count on delete-on-last-close semantics
   * or throws an exception when trying to remove a file in use, return
   * false (eg NFS). Otherwise, return true. Defaults to returning false.
   * 
   * @return true if factory impl requires that Searcher's explicitly
   * reserve commit points.
   */
  public boolean searchersReserveCommitPoints() {
    return false;
  }
{code}

{{HdfsDirectoryFactory}} is (still) the only {{DirectoryFactory}} Impl that 
returns {{true}}.

----

*Concern:*

As noted in LUCENE-9040  The behavior of {{DirectoryReader.getIndexCommit()}} 
is a little weird / underspecified when dealing with an "NRT" {{IndexReader}} 
(opened directly off of an {{IndexWriter}} using "un-committed" changes) ... 
which is exactly what {{SolrIndexSearcher}} is using in solr setups that use 
{{softCommit=true&openSearcher=false}}.

In particular the {{IndexCommit.getGeneration()}} value that will be used when 
{{SolrIndexSearcher}} executes 
{{core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration());}}
 will be (as of the current code) the {{generation}} of the last _hard_ commit 
-- meaning that new segment/data files since the last "hard commit" will not be 
protected from deletion if additional commits/merges happen on the index 
duringthe life of the {{SolrIndexSearcher}} -- either view concurrent rapid 
commits, or via {{commit=true&softCommit=false&openSearcher=false}}.

I have not investigated this in depth, but I believe there is risk here of 
unpredictible bugs when using HDFS in conjunction with 
{{softCommit=true&openSearcher=true}}.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to