Thanks for sharing the background of your indexing serialization
shenanigans :-) -- interesting.

I think IndexWriter.deleteAll() should ultimately reset
lowestUnassignedFieldNumber.  globalFieldNumberMap.clear() is only called
by deleteAll, so this simple proposal makes sense to me.  File a JIRA issue.

~ David Smiley
Apache Lucene/Solr Search Developer
http://www.linkedin.com/in/davidwsmiley


On Wed, Nov 18, 2020 at 1:17 PM Michael Froh <msf...@gmail.com> wrote:

> I have some code that is kind of abusing IndexWriter.deleteAll(). In
> short, I'm basically experimenting with using tiny (one block of joined
> parent/child documents) indexes as a serialized format to index on one
> fleet and then merge these tiny indexes on another fleet. I'm doing this by
> indexing a block, committing, storing the contents of the index directory
> in a zip file, invoking deleteAll(), and repeating. Believe it or not, the
> performance is not terrible. (Currently getting about 20% of the throughput
> I see with regular indexing.)
>
> Regardless of my serialization shenanigans above, I've found that
> performance degrades over time for the process, as it spends more time
> allocating and freeing memory. Analyzing some heap dumps, it's because
> FieldInfos.byNumber is getting bigger and bigger. IndexWriter.deleteAll()
> doesn't truly reset state. Specifically, it calls
> globalFieldNumberMap.clear(), which clears all of the FieldNumbers
> collections, but it doesn't reset lowestUnassignedFieldNumber. So, that
> number keeps counting up, and new instances of FieldInfos allocate larger
> and larger arrays (and only use the top indices).
>
> Has anyone else encountered this? Can I open an issue for resetting
> lowestUnassignedFieldNumber in FieldNumbers.clear()? Is there any risk in
> doing so?
>
> (For my specific use-case, I would be okay with not clearing
> globalFieldNumberMap at all, since the set of fields is bounded, but
> assigning new field numbers is probably among the least of my costs.)
>

Reply via email to