[ 
https://issues.apache.org/jira/browse/SOLR-18121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18058903#comment-18058903
 ] 

Jason Gerlowski commented on SOLR-18121:
----------------------------------------

I think there are other bugs at play here too.  Look at how the 'indexDir' 
Directory instance is used in this code:

{code}
    solrCore.getDirectoryFactory().doneWithDirectory(indexDir);
    solrCore.deleteNonSnapshotIndexFiles(indexDirPath);
    this.solrCore.closeSearcher();
    assert testWait.getAsBoolean();
    
solrCore.getUpdateHandler().getSolrCoreState().closeIndexWriter(this.solrCore, 
false);
    for (String f : filesTobeDeleted) {
      try {
        indexDir.deleteFile(f);
      } catch (FileNotFoundException | NoSuchFileException e) {
        // no problem , it was deleted by someone else
      }
    }
{code}

We call {{DirectoryFactory.doneWithDirectory}} and then go on to use "indexDir" 
~10 lines later.  This whole codepath needs some attention IMO if we ultimately 
decide to keep it around.

> Ref-counting leak in IndexFetcher's "deleteFilesInAdvance" codepath
> -------------------------------------------------------------------
>
>                 Key: SOLR-18121
>                 URL: https://issues.apache.org/jira/browse/SOLR-18121
>             Project: Solr
>          Issue Type: Bug
>          Components: replication (java)
>            Reporter: Jason Gerlowski
>            Priority: Major
>
> While experimenting with TLOG and PULL replicas locally, I think I've 
> discovered a resource-leak in IndexFetcher.
> IndexFetcher's little-known "deleteFilesInAdvance" code-path, which deletes 
> the local copy of the index if space is needed to download the remote copy, 
> doesn't obey the contract of {{SolrCoreState.closeIndexWriter}}.  The problem 
> code in question is around 
> [L1273|https://github.com/apache/solr/blob/main/solr/core/src/java/org/apache/solr/handler/IndexFetcher.java#L1273]:
> {code}
>     
> solrCore.getUpdateHandler().getSolrCoreState().closeIndexWriter(this.solrCore,
>  false);
>     for (String f : filesTobeDeleted) {
>       try {
>         indexDir.deleteFile(f);
>       } catch (FileNotFoundException | NoSuchFileException e) {
>         // no problem , it was deleted by someone else
>       }   
>     }   
> {code}
> The javadocs for {{closeIndexWriter}} indicate:
> bq. Expert method that closes the IndexWriter - you must call {@link 
> #openIndexWriter(SolrCore)} in a finally block after calling this method.
> ...which clearly isn't happening in this case.
> This causes tangible issues in the field - we (Houston and myself) uncovered 
> this issue while debugging a collection-delete operation that stalled out 
> indefinitely.  The bug described above messes up the ref-count for the 
> SolrCore in question, so deleting this core/replica stalled forever waiting 
> for the ref-count to hit 0.
> Side note 1: My reading of the logic here suggests that this problem would 
> happen reliably any time this codepath is invoked.  It was a tough thing to 
> debug, so I wouldn't expect user bug re
> ports necessarily.  But our tests have pretty stringent ref-count checks, and 
> the fact that no tests report this issue suggests to me that this codepath is 
> entirely untested. 
> Side note 2: I have some larger questions about whether this codepath is 
> something we really want to continue supporting in Solr today.  You can 
> imagine scenarios where this could lead to data-loss if a replica gets into 
> this state and then something happens to the leader.  Safer to keep the data 
> that this replica does have around, even if it's slightly less up to date 
> than what the leader has currently.  But I'll raise those on SOLR-12999 which 
> initially introduced the feature.  Let's assume on the JIRA ticket that we do 
> want to keep the feature around.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to