Thanks David! Created https://issues.apache.org/jira/browse/LUCENE-9617 and posted a PR: https://github.com/apache/lucene-solr/pull/2088
On Wed, Nov 18, 2020 at 10:26 AM David Smiley <dsmi...@apache.org> wrote: > Thanks for sharing the background of your indexing serialization > shenanigans :-) -- interesting. > > I think IndexWriter.deleteAll() should ultimately reset > lowestUnassignedFieldNumber. globalFieldNumberMap.clear() is only called > by deleteAll, so this simple proposal makes sense to me. File a JIRA issue. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Wed, Nov 18, 2020 at 1:17 PM Michael Froh <msf...@gmail.com> wrote: > >> I have some code that is kind of abusing IndexWriter.deleteAll(). In >> short, I'm basically experimenting with using tiny (one block of joined >> parent/child documents) indexes as a serialized format to index on one >> fleet and then merge these tiny indexes on another fleet. I'm doing this by >> indexing a block, committing, storing the contents of the index directory >> in a zip file, invoking deleteAll(), and repeating. Believe it or not, the >> performance is not terrible. (Currently getting about 20% of the throughput >> I see with regular indexing.) >> >> Regardless of my serialization shenanigans above, I've found that >> performance degrades over time for the process, as it spends more time >> allocating and freeing memory. Analyzing some heap dumps, it's because >> FieldInfos.byNumber is getting bigger and bigger. IndexWriter.deleteAll() >> doesn't truly reset state. Specifically, it calls >> globalFieldNumberMap.clear(), which clears all of the FieldNumbers >> collections, but it doesn't reset lowestUnassignedFieldNumber. So, that >> number keeps counting up, and new instances of FieldInfos allocate larger >> and larger arrays (and only use the top indices). >> >> Has anyone else encountered this? Can I open an issue for resetting >> lowestUnassignedFieldNumber in FieldNumbers.clear()? Is there any risk in >> doing so? >> >> (For my specific use-case, I would be okay with not clearing >> globalFieldNumberMap at all, since the set of fields is bounded, but >> assigning new field numbers is probably among the least of my costs.) >> >