Thanks for the response Mikhail. I don't think I am looking for forceMergeDeletes() though since it could be more expensive than I would like and I only want to see the unreferenced segments with 0 live docs to be deleted. Just the way they get deleted with a commit=true option or even softDelete.
Another piece of important information that I missed out earlier is that when I examine the segments referenced by the segments_* files these segments (with 0 live docs) are no longer part of it, but they are still not cleared. Would appreciate more lines of thought! Thanks, Rahul On Tue, Aug 29, 2023 at 2:46 AM Mikhail Khludnev <m...@apache.org> wrote: > Hi Rahul. > Are you looking for > > https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexWriter.html#forceMergeDeletes() > ? > > On Tue, Aug 29, 2023 at 5:20 AM Rahul Goswami <rahul196...@gmail.com> > wrote: > > > Hello, > > I am trying to execute a program to read documents segment-by-segment and > > reindex to the same index. I am reading using Lucene apis and indexing > > using solr api (in a core that is currently loaded). > > > > What I am observing is that even after a segment has been fully processed > > and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment > > with 0 live docs gets left behind. *Upon Solr restart, the segment does > get > > cleared succesfully.* > > > > I tried to replicate same thing without the code by indexing 3 docs on an > > empty test core, and then reindexing the same docs. The older segment > gets > > deleted as soon as softCommit interval hits or an explicit commit=true is > > called. > > > > Here are the two approaches that I have tried. Approach 2 is inspired by > > the merge logic of accessing segments in case opening a DirectoryReader > > (Approach 1) externally is causing this issue. > > > > But both approaches leave undeleted segments behind until I restart Solr > > and load the core again. What am I missing? I don't have any more brain > > cells left to fry on this! > > > > Approach 1: > > ========= > > try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir())); > > IndexReader reader = DirectoryReader.open(dir)) { > > for (LeafReaderContext lrc : reader.leaves()) { > > > > //read live docs from each leaf , create a > > SolrInputDocument out of Document and index using Solr api > > > > } > > }catch(Exception e){ > > > > } > > > > Approach 2: > > ========== > > ReadersAndUpdates rld = null; > > SegmentReader segmentReader = null; > > RefCounted<IndexWriter> iwRef = > > core.getSolrCoreState().getIndexWriter(core); > > iw = iwRef.get(); > > try{ > > for (SegmentCommitInfo sci : segmentInfos) { > > rld = iw.getPooledInstance(sci, true); > > segmentReader = rld.getReader(IOContext.READ); > > > > //process all live docs similar to above using the segmentReader. > > > > rld.release(segmentReader); > > iw.release(rld); > > }finally{ > > if (iwRef != null) { > > iwRef.decref(); > > } > > } > > > > Help would be much appreciated! > > > > Thanks, > > Rahul > > > > > -- > Sincerely yours > Mikhail Khludnev >