Hi Rahul, Please do not pursue Approach 2 :) ReadersAndUpdates.release is not something the application should be calling. This path can only lead to pain.
It sounds to me like something in Solr is holding an old reader (maybe the last commit point, or reader prior to the refresh after you re-indexed all docs in a given now 100% deleted segment) open. Does Solr keep old readers open, older than the most recent commit? Do you have queries in flight that might be holding the old reader open? Given that your small by-hand test case (3 docs) correctly showed the 100% deleted segment being reclaimed after the soft commit interval or a manual hard commit, something must be different in the larger use case that is causing Solr to keep a still old reader open. Is there any logging you can enable to understand Solr's handling of its IndexReaders' lifecycle? Mike McCandless http://blog.mikemccandless.com On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196...@gmail.com> wrote: > Hello, > I am trying to execute a program to read documents segment-by-segment and > reindex to the same index. I am reading using Lucene apis and indexing > using solr api (in a core that is currently loaded). > > What I am observing is that even after a segment has been fully processed > and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment > with 0 live docs gets left behind. *Upon Solr restart, the segment does get > cleared succesfully.* > > I tried to replicate same thing without the code by indexing 3 docs on an > empty test core, and then reindexing the same docs. The older segment gets > deleted as soon as softCommit interval hits or an explicit commit=true is > called. > > Here are the two approaches that I have tried. Approach 2 is inspired by > the merge logic of accessing segments in case opening a DirectoryReader > (Approach 1) externally is causing this issue. > > But both approaches leave undeleted segments behind until I restart Solr > and load the core again. What am I missing? I don't have any more brain > cells left to fry on this! > > Approach 1: > ========= > try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir())); > IndexReader reader = DirectoryReader.open(dir)) { > for (LeafReaderContext lrc : reader.leaves()) { > > //read live docs from each leaf , create a > SolrInputDocument out of Document and index using Solr api > > } > }catch(Exception e){ > > } > > Approach 2: > ========== > ReadersAndUpdates rld = null; > SegmentReader segmentReader = null; > RefCounted<IndexWriter> iwRef = > core.getSolrCoreState().getIndexWriter(core); > iw = iwRef.get(); > try{ > for (SegmentCommitInfo sci : segmentInfos) { > rld = iw.getPooledInstance(sci, true); > segmentReader = rld.getReader(IOContext.READ); > > //process all live docs similar to above using the segmentReader. > > rld.release(segmentReader); > iw.release(rld); > }finally{ > if (iwRef != null) { > iwRef.decref(); > } > } > > Help would be much appreciated! > > Thanks, > Rahul >