Uwe, Thanks for the response. I have openSearcher=false in autoCommit, but I do have an autoSoftCommit interval of 5 minutes configured as well which should open a searcher. In vanilla Solr, without my code, I see that if I completely reindex all documents in a segment (via a client call), the segment does get deleted after the soft commit interval. However if I process the segments as per Approach-1 in my original email, I see that the 0 doc 7.x segment stays even after the process finishes, i.e even after I exit the try-with-resources block. Note that my index is a mix of 7.x and 8.x segments and I am only reindexing 7.x segments by preventing them from participating in merge via a custom MergePolicy. Additionally as mentioned, Solr provides a handler (<core>/admin/segments) which does what Luke does and it shows that by the end of the process there are no more 7.x segments as referenced by the segments_x file. But for some reason the physical 7.x segment files continue to stay behind until I restart Solr.
Thanks, Rahul On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler <u...@thetaphi.de> wrote: > Hi, > > in Solr the empty segment keeps open as long as there is a Searcher > still open. At some point the empty segment (100% deletions) will be > deleted, but you have to wait until SolIndexSearcher has restarted. > Maybe check your solrconfig.xml and check if openSearcher is enabled > after autoSoftCommit: > > https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html > > Uwe > > Am 31.08.2023 um 21:35 schrieb Rahul Goswami: > > Stefan, Mike, > > Appreciate your responses! I spent some time analyzing your inputs and > > going further down the rabbit hole. > > > > Stefan, > > I looked at the IndexRearranger code you referenced where it tries to > drop > > the segment. I see that it eventually gets handled via > > IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion > > criteria). The same method also gets called as part of > IndexWrtier.commit() > > flow (Inside finishCommit()). So in an ideal scenario a commit should > have > > taken care of dropping the segment files. So that tells me the refCounts > > for the files are not getting set to 0. I have a fair suspicion the > > reindexing process running on the same index inside the same JVM has to > do > > something with it. > > > > Mike, > > Thanks for the caution on Approach 2 ...good to at least be able to > > continue on one train of thought. As mentioned in my response to Stefan, > > the reindexing is going on *inside* of the Solr JVM as an asynchronous > > thread and not as a separate process. So I believe the open reader you > are > > alluding to might be the one I am opening to through > DirectoryReader.open() > > (?) . However, looking at the code, I am seeing IndexFileDeleter.incRef() > > only on the files in SegmentCommitInfos. > > > > Does an incRef() also happen when an IndexReader is opened ? > > > > Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By > extending > > TMP and overloading findMerges() I am preventing 7.x segments from > > participating in merges, and the code only reindexes these 7.x segments > > into the same index, segment-by-segment. > > In the current tests I am performing, there are no parallel search or > > indexing threads through an external request. The reindexing is the only > > process interacting with the index. The goal is to eventually have this > > running alongside any parallel indexing/search requests on the index. > > Also, as noted earlier, by inspecting the SegmentInfos , I can see the > 7.x > > segment progressively reducing, but the files never get cleared. > > > > If it is my reader that is throwing off the refCount for Solr, what could > > be another way of reading the index without bloating it up with 0 doc > > segments? > > > > I will also try floating this in the Solr list to get answers to some of > > the questions you pose around Solr's handling of readers.. > > > > Thanks, > > Rahul > > > > > > > > > > On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless < > > luc...@mikemccandless.com> wrote: > > > >> Hi Rahul, > >> > >> Please do not pursue Approach 2 :) ReadersAndUpdates.release is not > >> something the application should be calling. This path can only lead to > >> pain. > >> > >> It sounds to me like something in Solr is holding an old reader (maybe > the > >> last commit point, or reader prior to the refresh after you re-indexed > all > >> docs in a given now 100% deleted segment) open. > >> > >> Does Solr keep old readers open, older than the most recent commit? Do > >> you have queries in flight that might be holding the old reader open? > >> > >> Given that your small by-hand test case (3 docs) correctly showed the > 100% > >> deleted segment being reclaimed after the soft commit interval or a > manual > >> hard commit, something must be different in the larger use case that is > >> causing Solr to keep a still old reader open. Is there any logging you > can > >> enable to understand Solr's handling of its IndexReaders' lifecycle? > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196...@gmail.com> > >> wrote: > >> > >>> Hello, > >>> I am trying to execute a program to read documents segment-by-segment > and > >>> reindex to the same index. I am reading using Lucene apis and indexing > >>> using solr api (in a core that is currently loaded). > >>> > >>> What I am observing is that even after a segment has been fully > processed > >>> and an autoCommit (as well as autoSoftCommit ) has kicked in, the > segment > >>> with 0 live docs gets left behind. *Upon Solr restart, the segment does > >>> get > >>> cleared succesfully.* > >>> > >>> I tried to replicate same thing without the code by indexing 3 docs on > an > >>> empty test core, and then reindexing the same docs. The older segment > gets > >>> deleted as soon as softCommit interval hits or an explicit commit=true > is > >>> called. > >>> > >>> Here are the two approaches that I have tried. Approach 2 is inspired > by > >>> the merge logic of accessing segments in case opening a DirectoryReader > >>> (Approach 1) externally is causing this issue. > >>> > >>> But both approaches leave undeleted segments behind until I restart > Solr > >>> and load the core again. What am I missing? I don't have any more brain > >>> cells left to fry on this! > >>> > >>> Approach 1: > >>> ========= > >>> try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir())); > >>> IndexReader reader = DirectoryReader.open(dir)) { > >>> for (LeafReaderContext lrc : reader.leaves()) { > >>> > >>> //read live docs from each leaf , create a > >>> SolrInputDocument out of Document and index using Solr api > >>> > >>> } > >>> }catch(Exception e){ > >>> > >>> } > >>> > >>> Approach 2: > >>> ========== > >>> ReadersAndUpdates rld = null; > >>> SegmentReader segmentReader = null; > >>> RefCounted<IndexWriter> iwRef = > >>> core.getSolrCoreState().getIndexWriter(core); > >>> iw = iwRef.get(); > >>> try{ > >>> for (SegmentCommitInfo sci : segmentInfos) { > >>> rld = iw.getPooledInstance(sci, true); > >>> segmentReader = rld.getReader(IOContext.READ); > >>> > >>> //process all live docs similar to above using the segmentReader. > >>> > >>> rld.release(segmentReader); > >>> iw.release(rld); > >>> }finally{ > >>> if (iwRef != null) { > >>> iwRef.decref(); > >>> } > >>> } > >>> > >>> Help would be much appreciated! > >>> > >>> Thanks, > >>> Rahul > >>> > -- > Uwe Schindler > Achterdiek 19, D-28357 Bremen > https://www.thetaphi.de > eMail: u...@thetaphi.de > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >