Chris M. Hostetter created SOLR-13908: -----------------------------------------
Summary: Possible bugs when using HdfsDirectoryFactory w/ softCommit=true + openSearcher=true Key: SOLR-13908 URL: https://issues.apache.org/jira/browse/SOLR-13908 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: hdfs Reporter: Chris M. Hostetter While working on SOLR-13872 something caught my eye that seems fishy.... *Background:* SOLR-4916 introduced the API {{DirectoryFactory.searchersReserveCommitPoints()}} -- a method that {{SolrIndexSearcher}} uses to decide if it needs to explicitly save/release the {{IndexCommit}} point of it's {{DirectoryReader}} with the {{IndexDeletionPolicytWrapper}}, for use on Filesystems that don't in some way "protect" open files... {code:title=SolrIndexSearcher} if (directoryFactory.searchersReserveCommitPoints()) { // reserve commit point for life of searcher core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration()); } {code} {code:title=DirectoryFactory} /** * If your implementation can count on delete-on-last-close semantics * or throws an exception when trying to remove a file in use, return * false (eg NFS). Otherwise, return true. Defaults to returning false. * * @return true if factory impl requires that Searcher's explicitly * reserve commit points. */ public boolean searchersReserveCommitPoints() { return false; } {code} {{HdfsDirectoryFactory}} is (still) the only {{DirectoryFactory}} Impl that returns {{true}}. ---- *Concern:* As noted in LUCENE-9040 The behavior of {{DirectoryReader.getIndexCommit()}} is a little weird / underspecified when dealing with an "NRT" {{IndexReader}} (opened directly off of an {{IndexWriter}} using "un-committed" changes) ... which is exactly what {{SolrIndexSearcher}} is using in solr setups that use {{softCommit=true&openSearcher=false}}. In particular the {{IndexCommit.getGeneration()}} value that will be used when {{SolrIndexSearcher}} executes {{core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration());}} will be (as of the current code) the {{generation}} of the last _hard_ commit -- meaning that new segment/data files since the last "hard commit" will not be protected from deletion if additional commits/merges happen on the index duringthe life of the {{SolrIndexSearcher}} -- either view concurrent rapid commits, or via {{commit=true&softCommit=false&openSearcher=false}}. I have not investigated this in depth, but I believe there is risk here of unpredictible bugs when using HDFS in conjunction with {{softCommit=true&openSearcher=true}}. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org