Hi Rahul,

What you're describing sounds similar to index rearranging [1], although in
that case the reindexing is done in a new index. The last commit in the
IndexRearranger class added support for reading and reindexing deletes -
maybe
having a look at that and at the Javadoc would help?


Stefan

[1]
https://github.com/apache/lucene/blob/d1c353116157d0375de9d673ae5e9c90524ffe2f/lucene/misc/src/java/org/apache/lucene/misc/index/IndexRearranger.java


On Wed, 30 Aug 2023 at 15:19, Rahul Goswami <rahul196...@gmail.com> wrote:

> Thanks for the response Mikhail. I don't think I am looking for
> forceMergeDeletes() though since it could be more expensive than I would
> like and I only want to see the unreferenced segments with 0 live docs to
> be deleted. Just the way they get deleted with a commit=true option or even
> softDelete.
>
> Another piece of important information that I missed out earlier is that
> when I examine the segments referenced by the segments_* files these
> segments (with 0 live docs) are no longer part of it, but they are still
> not cleared. Would appreciate more lines of thought!
>
> Thanks,
> Rahul
>
> On Tue, Aug 29, 2023 at 2:46 AM Mikhail Khludnev <m...@apache.org> wrote:
>
> > Hi Rahul.
> > Are you looking for
> >
> >
> https://lucene.apache.org/core/9_0_0/core/org/apache/lucene/index/IndexWriter.html#forceMergeDeletes()
> > ?
> >
> > On Tue, Aug 29, 2023 at 5:20 AM Rahul Goswami <rahul196...@gmail.com>
> > wrote:
> >
> > > Hello,
> > > I am trying to execute a program to read documents segment-by-segment
> and
> > > reindex to the same index. I am reading using Lucene apis and indexing
> > > using solr api (in a core that is currently loaded).
> > >
> > > What I am observing is that even after a segment has been fully
> processed
> > > and an autoCommit (as well as autoSoftCommit ) has kicked in, the
> segment
> > > with 0 live docs gets left behind. *Upon Solr restart, the segment does
> > get
> > > cleared succesfully.*
> > >
> > > I tried to replicate same thing without the code by indexing 3 docs on
> an
> > > empty test core, and then reindexing the same docs. The older segment
> > gets
> > > deleted as soon as softCommit interval hits or an explicit commit=true
> is
> > > called.
> > >
> > > Here are the two approaches that I have tried. Approach 2 is inspired
> by
> > > the merge logic of accessing segments in case opening a DirectoryReader
> > > (Approach 1) externally is causing this issue.
> > >
> > > But both approaches leave undeleted segments behind until I restart
> Solr
> > > and load the core again. What am I missing? I don't have any more brain
> > > cells left to fry on this!
> > >
> > > Approach 1:
> > > =========
> > > try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
> > >                     IndexReader reader = DirectoryReader.open(dir)) {
> > >                 for (LeafReaderContext lrc : reader.leaves()) {
> > >
> > >                        //read live docs from each leaf , create a
> > > SolrInputDocument out of Document and index using Solr api
> > >
> > >                 }
> > > }catch(Exception e){
> > >
> > > }
> > >
> > > Approach 2:
> > > ==========
> > > ReadersAndUpdates rld = null;
> > > SegmentReader segmentReader = null;
> > > RefCounted<IndexWriter> iwRef =
> > > core.getSolrCoreState().getIndexWriter(core);
> > >  iw = iwRef.get();
> > > try{
> > >   for (SegmentCommitInfo sci : segmentInfos) {
> > >      rld = iw.getPooledInstance(sci, true);
> > >      segmentReader = rld.getReader(IOContext.READ);
> > >
> > >     //process all live docs similar to above using the segmentReader.
> > >
> > >     rld.release(segmentReader);
> > >     iw.release(rld);
> > > }finally{
> > >    if (iwRef != null) {
> > >        iwRef.decref();
> > >     }
> > > }
> > >
> > > Help would be much appreciated!
> > >
> > > Thanks,
> > > Rahul
> > >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> >
>

Reply via email to