Thanks for sharing the background of your indexing serialization shenanigans :-) -- interesting.
I think IndexWriter.deleteAll() should ultimately reset lowestUnassignedFieldNumber. globalFieldNumberMap.clear() is only called by deleteAll, so this simple proposal makes sense to me. File a JIRA issue. ~ David Smiley Apache Lucene/Solr Search Developer http://www.linkedin.com/in/davidwsmiley On Wed, Nov 18, 2020 at 1:17 PM Michael Froh <msf...@gmail.com> wrote: > I have some code that is kind of abusing IndexWriter.deleteAll(). In > short, I'm basically experimenting with using tiny (one block of joined > parent/child documents) indexes as a serialized format to index on one > fleet and then merge these tiny indexes on another fleet. I'm doing this by > indexing a block, committing, storing the contents of the index directory > in a zip file, invoking deleteAll(), and repeating. Believe it or not, the > performance is not terrible. (Currently getting about 20% of the throughput > I see with regular indexing.) > > Regardless of my serialization shenanigans above, I've found that > performance degrades over time for the process, as it spends more time > allocating and freeing memory. Analyzing some heap dumps, it's because > FieldInfos.byNumber is getting bigger and bigger. IndexWriter.deleteAll() > doesn't truly reset state. Specifically, it calls > globalFieldNumberMap.clear(), which clears all of the FieldNumbers > collections, but it doesn't reset lowestUnassignedFieldNumber. So, that > number keeps counting up, and new instances of FieldInfos allocate larger > and larger arrays (and only use the top indices). > > Has anyone else encountered this? Can I open an issue for resetting > lowestUnassignedFieldNumber in FieldNumbers.clear()? Is there any risk in > doing so? > > (For my specific use-case, I would be okay with not clearing > globalFieldNumberMap at all, since the set of fields is bounded, but > assigning new field numbers is probably among the least of my costs.) >