Hi,

in Solr the empty segment keeps open as long as there is a Searcher still open. At some point the empty segment (100% deletions) will be deleted, but you have to wait until SolIndexSearcher has restarted. Maybe check your solrconfig.xml and check if openSearcher is enabled after autoSoftCommit: https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html

Uwe

Am 31.08.2023 um 21:35 schrieb Rahul Goswami:
Stefan, Mike,
Appreciate your responses! I spent some time analyzing your inputs and
going further down the rabbit hole.

Stefan,
I looked at the IndexRearranger code you referenced where it tries to drop
the segment. I see that it eventually gets handled via
IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion
criteria). The same method also gets called as part of IndexWrtier.commit()
flow (Inside finishCommit()). So in an ideal scenario a commit should have
taken care of dropping the segment files. So that tells me the refCounts
for the files are not getting set to 0. I have a fair suspicion the
reindexing process running on the same index inside the same JVM has to do
something with it.

Mike,
Thanks for the caution on Approach 2 ...good to at least be able to
continue on one train of thought. As mentioned in my response to Stefan,
the reindexing is going on *inside* of the Solr JVM as an asynchronous
thread and not as a separate process. So I believe the open reader you are
alluding to might be the one I am opening to through DirectoryReader.open()
(?) . However, looking at the code, I am seeing IndexFileDeleter.incRef()
only on the files in SegmentCommitInfos.

Does an incRef() also happen when an IndexReader is opened ?

Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By extending
TMP and overloading findMerges() I am preventing 7.x segments from
participating in merges, and the code only reindexes these 7.x segments
into the same index, segment-by-segment.
In the current tests I am performing, there are no parallel search or
indexing threads through an external request. The reindexing is the only
process interacting with the index. The goal is to eventually have this
running alongside any parallel indexing/search requests on the index.
Also, as noted earlier, by inspecting the SegmentInfos , I can see the 7.x
segment progressively reducing, but the files never get cleared.

If it is my reader that is throwing off the refCount for Solr, what could
be another way of reading the index without bloating it up with 0 doc
segments?

I will also try floating this in the Solr list to get answers to some of
the questions you pose around Solr's handling of readers..

Thanks,
Rahul




On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

Hi Rahul,

Please do not pursue Approach 2 :)  ReadersAndUpdates.release is not
something the application should be calling.  This path can only lead to
pain.

It sounds to me like something in Solr is holding an old reader (maybe the
last commit point, or reader prior to the refresh after you re-indexed all
docs in a given now 100% deleted segment) open.

Does Solr keep old readers open, older than the most recent commit?  Do
you have queries in flight that might be holding the old reader open?

Given that your small by-hand test case (3 docs) correctly showed the 100%
deleted segment being reclaimed after the soft commit interval or a manual
hard commit, something must be different in the larger use case that is
causing Solr to keep a still old reader open.  Is there any logging you can
enable to understand Solr's handling of its IndexReaders' lifecycle?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196...@gmail.com>
wrote:

Hello,
I am trying to execute a program to read documents segment-by-segment and
reindex to the same index. I am reading using Lucene apis and indexing
using solr api (in a core that is currently loaded).

What I am observing is that even after a segment has been fully processed
and an autoCommit (as well as autoSoftCommit ) has kicked in, the segment
with 0 live docs gets left behind. *Upon Solr restart, the segment does
get
cleared succesfully.*

I tried to replicate same thing without the code by indexing 3 docs on an
empty test core, and then reindexing the same docs. The older segment gets
deleted as soon as softCommit interval hits or an explicit commit=true is
called.

Here are the two approaches that I have tried. Approach 2 is inspired by
the merge logic of accessing segments in case opening a DirectoryReader
(Approach 1) externally is causing this issue.

But both approaches leave undeleted segments behind until I restart Solr
and load the core again. What am I missing? I don't have any more brain
cells left to fry on this!

Approach 1:
=========
try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
                     IndexReader reader = DirectoryReader.open(dir)) {
                 for (LeafReaderContext lrc : reader.leaves()) {

                        //read live docs from each leaf , create a
SolrInputDocument out of Document and index using Solr api

                 }
}catch(Exception e){

}

Approach 2:
==========
ReadersAndUpdates rld = null;
SegmentReader segmentReader = null;
RefCounted<IndexWriter> iwRef =
core.getSolrCoreState().getIndexWriter(core);
  iw = iwRef.get();
try{
   for (SegmentCommitInfo sci : segmentInfos) {
      rld = iw.getPooledInstance(sci, true);
      segmentReader = rld.getReader(IOContext.READ);

     //process all live docs similar to above using the segmentReader.

     rld.release(segmentReader);
     iw.release(rld);
}finally{
    if (iwRef != null) {
        iwRef.decref();
     }
}

Help would be much appreciated!

Thanks,
Rahul

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to