Uwe,
Thanks for the response. I have openSearcher=false in autoCommit, but I do
have an autoSoftCommit interval of 5 minutes configured as well which
should open a searcher.
In vanilla Solr, without my code, I see that if I completely reindex all
documents in a segment (via a client call), the segment does get deleted
after the soft commit interval. However if I process the segments as per
Approach-1 in my original email, I see that the 0 doc 7.x segment stays
even after the process finishes, i.e even after I exit the
try-with-resources block.  Note that my index is a mix of 7.x and 8.x
segments and I am only reindexing 7.x segments by preventing them from
participating in merge via a custom MergePolicy.
Additionally as mentioned, Solr provides a handler (<core>/admin/segments)
which does what Luke does and it shows that by the end of the process there
are no more 7.x segments as referenced by the segments_x file. But for some
reason the physical 7.x segment files continue to stay behind until I
restart Solr.

Thanks,
Rahul

On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler <u...@thetaphi.de> wrote:

> Hi,
>
> in Solr the empty segment keeps open as long as there is a Searcher
> still open. At some point the empty segment (100% deletions) will be
> deleted, but you have to wait until SolIndexSearcher has restarted.
> Maybe check your solrconfig.xml and check if openSearcher is enabled
> after autoSoftCommit:
>
> https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html
>
> Uwe
>
> Am 31.08.2023 um 21:35 schrieb Rahul Goswami:
> > Stefan, Mike,
> > Appreciate your responses! I spent some time analyzing your inputs and
> > going further down the rabbit hole.
> >
> > Stefan,
> > I looked at the IndexRearranger code you referenced where it tries to
> drop
> > the segment. I see that it eventually gets handled via
> > IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion
> > criteria). The same method also gets called as part of
> IndexWrtier.commit()
> > flow (Inside finishCommit()). So in an ideal scenario a commit should
> have
> > taken care of dropping the segment files. So that tells me the refCounts
> > for the files are not getting set to 0. I have a fair suspicion the
> > reindexing process running on the same index inside the same JVM has to
> do
> > something with it.
> >
> > Mike,
> > Thanks for the caution on Approach 2 ...good to at least be able to
> > continue on one train of thought. As mentioned in my response to Stefan,
> > the reindexing is going on *inside* of the Solr JVM as an asynchronous
> > thread and not as a separate process. So I believe the open reader you
> are
> > alluding to might be the one I am opening to through
> DirectoryReader.open()
> > (?) . However, looking at the code, I am seeing IndexFileDeleter.incRef()
> > only on the files in SegmentCommitInfos.
> >
> > Does an incRef() also happen when an IndexReader is opened ?
> >
> > Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By
> extending
> > TMP and overloading findMerges() I am preventing 7.x segments from
> > participating in merges, and the code only reindexes these 7.x segments
> > into the same index, segment-by-segment.
> > In the current tests I am performing, there are no parallel search or
> > indexing threads through an external request. The reindexing is the only
> > process interacting with the index. The goal is to eventually have this
> > running alongside any parallel indexing/search requests on the index.
> > Also, as noted earlier, by inspecting the SegmentInfos , I can see the
> 7.x
> > segment progressively reducing, but the files never get cleared.
> >
> > If it is my reader that is throwing off the refCount for Solr, what could
> > be another way of reading the index without bloating it up with 0 doc
> > segments?
> >
> > I will also try floating this in the Solr list to get answers to some of
> > the questions you pose around Solr's handling of readers..
> >
> > Thanks,
> > Rahul
> >
> >
> >
> >
> > On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless <
> > luc...@mikemccandless.com> wrote:
> >
> >> Hi Rahul,
> >>
> >> Please do not pursue Approach 2 :)  ReadersAndUpdates.release is not
> >> something the application should be calling.  This path can only lead to
> >> pain.
> >>
> >> It sounds to me like something in Solr is holding an old reader (maybe
> the
> >> last commit point, or reader prior to the refresh after you re-indexed
> all
> >> docs in a given now 100% deleted segment) open.
> >>
> >> Does Solr keep old readers open, older than the most recent commit?  Do
> >> you have queries in flight that might be holding the old reader open?
> >>
> >> Given that your small by-hand test case (3 docs) correctly showed the
> 100%
> >> deleted segment being reclaimed after the soft commit interval or a
> manual
> >> hard commit, something must be different in the larger use case that is
> >> causing Solr to keep a still old reader open.  Is there any logging you
> can
> >> enable to understand Solr's handling of its IndexReaders' lifecycle?
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196...@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>> I am trying to execute a program to read documents segment-by-segment
> and
> >>> reindex to the same index. I am reading using Lucene apis and indexing
> >>> using solr api (in a core that is currently loaded).
> >>>
> >>> What I am observing is that even after a segment has been fully
> processed
> >>> and an autoCommit (as well as autoSoftCommit ) has kicked in, the
> segment
> >>> with 0 live docs gets left behind. *Upon Solr restart, the segment does
> >>> get
> >>> cleared succesfully.*
> >>>
> >>> I tried to replicate same thing without the code by indexing 3 docs on
> an
> >>> empty test core, and then reindexing the same docs. The older segment
> gets
> >>> deleted as soon as softCommit interval hits or an explicit commit=true
> is
> >>> called.
> >>>
> >>> Here are the two approaches that I have tried. Approach 2 is inspired
> by
> >>> the merge logic of accessing segments in case opening a DirectoryReader
> >>> (Approach 1) externally is causing this issue.
> >>>
> >>> But both approaches leave undeleted segments behind until I restart
> Solr
> >>> and load the core again. What am I missing? I don't have any more brain
> >>> cells left to fry on this!
> >>>
> >>> Approach 1:
> >>> =========
> >>> try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
> >>>                      IndexReader reader = DirectoryReader.open(dir)) {
> >>>                  for (LeafReaderContext lrc : reader.leaves()) {
> >>>
> >>>                         //read live docs from each leaf , create a
> >>> SolrInputDocument out of Document and index using Solr api
> >>>
> >>>                  }
> >>> }catch(Exception e){
> >>>
> >>> }
> >>>
> >>> Approach 2:
> >>> ==========
> >>> ReadersAndUpdates rld = null;
> >>> SegmentReader segmentReader = null;
> >>> RefCounted<IndexWriter> iwRef =
> >>> core.getSolrCoreState().getIndexWriter(core);
> >>>   iw = iwRef.get();
> >>> try{
> >>>    for (SegmentCommitInfo sci : segmentInfos) {
> >>>       rld = iw.getPooledInstance(sci, true);
> >>>       segmentReader = rld.getReader(IOContext.READ);
> >>>
> >>>      //process all live docs similar to above using the segmentReader.
> >>>
> >>>      rld.release(segmentReader);
> >>>      iw.release(rld);
> >>> }finally{
> >>>     if (iwRef != null) {
> >>>         iwRef.decref();
> >>>      }
> >>> }
> >>>
> >>> Help would be much appreciated!
> >>>
> >>> Thanks,
> >>> Rahul
> >>>
> --
> Uwe Schindler
> Achterdiek 19, D-28357 Bremen
> https://www.thetaphi.de
> eMail: u...@thetaphi.de
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to