It looks like your code has a leak and does not close all IndexReaders/Writers that you use during your custom code in Solr. It is impossible to review this from outside.

You shuld use the Solr provided SolrIndexWriter and SolrIndexSearcher to do your custom stuff and let Solr manage them.

Uwe

Am 10.09.2023 um 04:09 schrieb Rahul Goswami:
Uwe,
Thanks for the response. I have openSearcher=false in autoCommit, but I do
have an autoSoftCommit interval of 5 minutes configured as well which
should open a searcher.
In vanilla Solr, without my code, I see that if I completely reindex all
documents in a segment (via a client call), the segment does get deleted
after the soft commit interval. However if I process the segments as per
Approach-1 in my original email, I see that the 0 doc 7.x segment stays
even after the process finishes, i.e even after I exit the
try-with-resources block.  Note that my index is a mix of 7.x and 8.x
segments and I am only reindexing 7.x segments by preventing them from
participating in merge via a custom MergePolicy.
Additionally as mentioned, Solr provides a handler (<core>/admin/segments)
which does what Luke does and it shows that by the end of the process there
are no more 7.x segments as referenced by the segments_x file. But for some
reason the physical 7.x segment files continue to stay behind until I
restart Solr.

Thanks,
Rahul

On Mon, Sep 4, 2023 at 7:18 AM Uwe Schindler <u...@thetaphi.de> wrote:

Hi,

in Solr the empty segment keeps open as long as there is a Searcher
still open. At some point the empty segment (100% deletions) will be
deleted, but you have to wait until SolIndexSearcher has restarted.
Maybe check your solrconfig.xml and check if openSearcher is enabled
after autoSoftCommit:

https://solr.apache.org/guide/solr/latest/configuration-guide/commits-transaction-logs.html

Uwe

Am 31.08.2023 um 21:35 schrieb Rahul Goswami:
Stefan, Mike,
Appreciate your responses! I spent some time analyzing your inputs and
going further down the rabbit hole.

Stefan,
I looked at the IndexRearranger code you referenced where it tries to
drop
the segment. I see that it eventually gets handled via
IndexFileDeleter.checkpoint() through file refCounts (=0 for deletion
criteria). The same method also gets called as part of
IndexWrtier.commit()
flow (Inside finishCommit()). So in an ideal scenario a commit should
have
taken care of dropping the segment files. So that tells me the refCounts
for the files are not getting set to 0. I have a fair suspicion the
reindexing process running on the same index inside the same JVM has to
do
something with it.

Mike,
Thanks for the caution on Approach 2 ...good to at least be able to
continue on one train of thought. As mentioned in my response to Stefan,
the reindexing is going on *inside* of the Solr JVM as an asynchronous
thread and not as a separate process. So I believe the open reader you
are
alluding to might be the one I am opening to through
DirectoryReader.open()
(?) . However, looking at the code, I am seeing IndexFileDeleter.incRef()
only on the files in SegmentCommitInfos.

Does an incRef() also happen when an IndexReader is opened ?

Note:The index is a mix of 7.x and 8.x segments (on Solr 8.x). By
extending
TMP and overloading findMerges() I am preventing 7.x segments from
participating in merges, and the code only reindexes these 7.x segments
into the same index, segment-by-segment.
In the current tests I am performing, there are no parallel search or
indexing threads through an external request. The reindexing is the only
process interacting with the index. The goal is to eventually have this
running alongside any parallel indexing/search requests on the index.
Also, as noted earlier, by inspecting the SegmentInfos , I can see the
7.x
segment progressively reducing, but the files never get cleared.

If it is my reader that is throwing off the refCount for Solr, what could
be another way of reading the index without bloating it up with 0 doc
segments?

I will also try floating this in the Solr list to get answers to some of
the questions you pose around Solr's handling of readers..

Thanks,
Rahul




On Thu, Aug 31, 2023 at 6:48 AM Michael McCandless <
luc...@mikemccandless.com> wrote:

Hi Rahul,

Please do not pursue Approach 2 :)  ReadersAndUpdates.release is not
something the application should be calling.  This path can only lead to
pain.

It sounds to me like something in Solr is holding an old reader (maybe
the
last commit point, or reader prior to the refresh after you re-indexed
all
docs in a given now 100% deleted segment) open.

Does Solr keep old readers open, older than the most recent commit?  Do
you have queries in flight that might be holding the old reader open?

Given that your small by-hand test case (3 docs) correctly showed the
100%
deleted segment being reclaimed after the soft commit interval or a
manual
hard commit, something must be different in the larger use case that is
causing Solr to keep a still old reader open.  Is there any logging you
can
enable to understand Solr's handling of its IndexReaders' lifecycle?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Aug 28, 2023 at 10:20 PM Rahul Goswami <rahul196...@gmail.com>
wrote:

Hello,
I am trying to execute a program to read documents segment-by-segment
and
reindex to the same index. I am reading using Lucene apis and indexing
using solr api (in a core that is currently loaded).

What I am observing is that even after a segment has been fully
processed
and an autoCommit (as well as autoSoftCommit ) has kicked in, the
segment
with 0 live docs gets left behind. *Upon Solr restart, the segment does
get
cleared succesfully.*

I tried to replicate same thing without the code by indexing 3 docs on
an
empty test core, and then reindexing the same docs. The older segment
gets
deleted as soon as softCommit interval hits or an explicit commit=true
is
called.

Here are the two approaches that I have tried. Approach 2 is inspired
by
the merge logic of accessing segments in case opening a DirectoryReader
(Approach 1) externally is causing this issue.

But both approaches leave undeleted segments behind until I restart
Solr
and load the core again. What am I missing? I don't have any more brain
cells left to fry on this!

Approach 1:
=========
try (FSDirectory dir = FSDirectory.open(Paths.get(core.getIndexDir()));
                      IndexReader reader = DirectoryReader.open(dir)) {
                  for (LeafReaderContext lrc : reader.leaves()) {

                         //read live docs from each leaf , create a
SolrInputDocument out of Document and index using Solr api

                  }
}catch(Exception e){

}

Approach 2:
==========
ReadersAndUpdates rld = null;
SegmentReader segmentReader = null;
RefCounted<IndexWriter> iwRef =
core.getSolrCoreState().getIndexWriter(core);
   iw = iwRef.get();
try{
    for (SegmentCommitInfo sci : segmentInfos) {
       rld = iw.getPooledInstance(sci, true);
       segmentReader = rld.getReader(IOContext.READ);

      //process all live docs similar to above using the segmentReader.

      rld.release(segmentReader);
      iw.release(rld);
}finally{
     if (iwRef != null) {
         iwRef.decref();
      }
}

Help would be much appreciated!

Thanks,
Rahul

--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


--
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: u...@thetaphi.de


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to