Thanks David!

Created https://issues.apache.org/jira/browse/LUCENE-9617 and posted a PR:
https://github.com/apache/lucene-solr/pull/2088

On Wed, Nov 18, 2020 at 10:26 AM David Smiley <dsmi...@apache.org> wrote:

> Thanks for sharing the background of your indexing serialization
> shenanigans :-) -- interesting.
>
> I think IndexWriter.deleteAll() should ultimately reset
> lowestUnassignedFieldNumber.  globalFieldNumberMap.clear() is only called
> by deleteAll, so this simple proposal makes sense to me.  File a JIRA issue.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Nov 18, 2020 at 1:17 PM Michael Froh <msf...@gmail.com> wrote:
>
>> I have some code that is kind of abusing IndexWriter.deleteAll(). In
>> short, I'm basically experimenting with using tiny (one block of joined
>> parent/child documents) indexes as a serialized format to index on one
>> fleet and then merge these tiny indexes on another fleet. I'm doing this by
>> indexing a block, committing, storing the contents of the index directory
>> in a zip file, invoking deleteAll(), and repeating. Believe it or not, the
>> performance is not terrible. (Currently getting about 20% of the throughput
>> I see with regular indexing.)
>>
>> Regardless of my serialization shenanigans above, I've found that
>> performance degrades over time for the process, as it spends more time
>> allocating and freeing memory. Analyzing some heap dumps, it's because
>> FieldInfos.byNumber is getting bigger and bigger. IndexWriter.deleteAll()
>> doesn't truly reset state. Specifically, it calls
>> globalFieldNumberMap.clear(), which clears all of the FieldNumbers
>> collections, but it doesn't reset lowestUnassignedFieldNumber. So, that
>> number keeps counting up, and new instances of FieldInfos allocate larger
>> and larger arrays (and only use the top indices).
>>
>> Has anyone else encountered this? Can I open an issue for resetting
>> lowestUnassignedFieldNumber in FieldNumbers.clear()? Is there any risk in
>> doing so?
>>
>> (For my specific use-case, I would be okay with not clearing
>> globalFieldNumberMap at all, since the set of fields is bounded, but
>> assigning new field numbers is probably among the least of my costs.)
>>
>

Reply via email to