Hi Rahul, What you're describing sounds similar to index rearranging [1], although in that case the reindexing is done in a new index. The last commit in the IndexRearranger class added support for reading and reindexing deletes - maybe having a look at that and at the Javadoc would help?
Stefan [1] https://github.com/apache/lucene/blob/d1c353116157d0375de9d673ae5e9c90524ffe2f/lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java On Wed, 30 Aug 2023 at 15:19, Rahul Goswami <rahul196...@gmail.com> wrote: > Thanks for the response Mikhail. I don't think I am looking for > forceMergeDeletes() though since it could be more expensive than I would > like and I only want to see the unreferenced segments with 0 live docs to > be deleted. Just the way they get deleted with a commit=true option or even > softDelete. > > Another piece of important information that I missed out earlier is that > when I examine the segments referenced by the segments_* files these > segments (with 0 live docs) are no longer part of it, but they are still > not cleared. Would appreciate more lines of thought! > > Thanks, > Rahul > > On Tue, Aug 29, 2023 at 2:46 AM Mikhail Khludnev <m...@apache.org> wrote: > > > Hi Rahul. > > Are you looking for > > > > > https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexWriter.html#forceMergeDeletes() > > ? > > > > On Tue, Aug 29, 2023 at 5:20 AM Rahul Goswami <rahul196...@gmail.com> > > wrote: > > > > > Hello, > > > I am trying to execute a program to read documents segment-by-segment > and > > > reindex to the same index. I am reading using Lucene apis and indexing > > > using solr api (in a core that is currently loaded). > > > > > > What I am observing is that even after a segment has been fully > processed > > > and an autoCommit (as well as autoSoftCommit ) has kicked in, the > segment > > > with 0 live docs gets left behind. *Upon Solr restart, the segment does > > get > > > cleared succesfully.* > > > > > > I tried to replicate same thing without the code by indexing 3 docs on > an > > > empty test core, and then reindexing the same docs. The older segment > > gets > > > deleted as soon as softCommit interval hits or an explicit commit=true > is > > > called. > > > > > > Here are the two approaches that I have tried. Approach 2 is inspired > by > > > the merge logic of accessing segments in case opening a DirectoryReader > > > (Approach 1) externally is causing this issue. > > > > > > But both approaches leave undeleted segments behind until I restart > Solr > > > and load the core again. What am I missing? I don't have any more brain > > > cells left to fry on this! > > > > > > Approach 1: > > > ========= > > > try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir())); > > > IndexReader reader = DirectoryReader.open(dir)) { > > > for (LeafReaderContext lrc : reader.leaves()) { > > > > > > //read live docs from each leaf , create a > > > SolrInputDocument out of Document and index using Solr api > > > > > > } > > > }catch(Exception e){ > > > > > > } > > > > > > Approach 2: > > > ========== > > > ReadersAndUpdates rld = null; > > > SegmentReader segmentReader = null; > > > RefCounted<IndexWriter> iwRef = > > > core.getSolrCoreState().getIndexWriter(core); > > > iw = iwRef.get(); > > > try{ > > > for (SegmentCommitInfo sci : segmentInfos) { > > > rld = iw.getPooledInstance(sci, true); > > > segmentReader = rld.getReader(IOContext.READ); > > > > > > //process all live docs similar to above using the segmentReader. > > > > > > rld.release(segmentReader); > > > iw.release(rld); > > > }finally{ > > > if (iwRef != null) { > > > iwRef.decref(); > > > } > > > } > > > > > > Help would be much appreciated! > > > > > > Thanks, > > > Rahul > > > > > > > > > -- > > Sincerely yours > > Mikhail Khludnev > > >