Thanks for the response Mikhail. I don't think I am looking for
forceMergeDeletes() though since it could be more expensive than I would
like and I only want to see the unreferenced segments with 0 live docs to
be deleted. Just the way they get deleted with a commit=true option or even
softDelete.

Another piece of important information that I missed out earlier is that
when I examine the segments referenced by the segments_* files these
segments (with 0 live docs) are no longer part of it, but they are still
not cleared. Would appreciate more lines of thought!

Thanks,
Rahul

On Tue, Aug 29, 2023 at 2:46 AM Mikhail Khludnev <m...@apache.org> wrote:

> Hi Rahul.
> Are you looking for
>
> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexWriter.html#forceMergeDeletes()
> ?
>
> On Tue, Aug 29, 2023 at 5:20 AM Rahul Goswami <rahul196...@gmail.com>
> wrote:
>
> > Hello,
> > I am trying to execute a program to read documents segment-by-segment and
> > reindex to the same index. I am reading using Lucene apis and indexing
> > using solr api (in a core that is currently loaded).
> >
> > What I am observing is that even after a segment has been fully processed
> > and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
> > with 0 live docs gets left behind. *Upon Solr restart, the segment does
> get
> > cleared succesfully.*
> >
> > I tried to replicate same thing without the code by indexing 3 docs on an
> > empty test core, and then reindexing the same docs. The older segment
> gets
> > deleted as soon as softCommit interval hits or an explicit commit=true is
> > called.
> >
> > Here are the two approaches that I have tried. Approach 2 is inspired by
> > the merge logic of accessing segments in case opening a DirectoryReader
> > (Approach 1) externally is causing this issue.
> >
> > But both approaches leave undeleted segments behind until I restart Solr
> > and load the core again. What am I missing? I don't have any more brain
> > cells left to fry on this!
> >
> > Approach 1:
> > =========
> > try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
> >                     IndexReader reader = DirectoryReader.open(dir)) {
> >                 for (LeafReaderContext lrc : reader.leaves()) {
> >
> >                        //read live docs from each leaf , create a
> > SolrInputDocument out of Document and index using Solr api
> >
> >                 }
> > }catch(Exception e){
> >
> > }
> >
> > Approach 2:
> > ==========
> > ReadersAndUpdates rld = null;
> > SegmentReader segmentReader = null;
> > RefCounted<IndexWriter> iwRef =
> > core.getSolrCoreState().getIndexWriter(core);
> >  iw = iwRef.get();
> > try{
> >   for (SegmentCommitInfo sci : segmentInfos) {
> >      rld = iw.getPooledInstance(sci, true);
> >      segmentReader = rld.getReader(IOContext.READ);
> >
> >     //process all live docs similar to above using the segmentReader.
> >
> >     rld.release(segmentReader);
> >     iw.release(rld);
> > }finally{
> >    if (iwRef != null) {
> >        iwRef.decref();
> >     }
> > }
> >
> > Help would be much appreciated!
> >
> > Thanks,
> > Rahul
> >
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>

Reply via email to