[ 
https://issues.apache.org/jira/browse/SOLR-14428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093979#comment-17093979
 ] 

David Smiley commented on SOLR-14428:
-------------------------------------

BTW This issue should probably be a Lucene JIRA issue but let's see where we go 
with this thread further.

This is a tricky issue with no easy solutions.  

I think I lean towards a SoftReference providing the best trade-off.

Query objects ought to be light-weight in general, so in that respect, the 
change to FuzzyQuery is disappointing.  Maybe the choice of FuzzyQuery 
computing the automaton up-front should be an option?  It could be a boolean; 
that's the simplest thing.  Or imagine a 
Function<FuzzyQuery,CompiledAutomata[]> supplied in the constructor that is 
potentially memo-izable.  Someone could even plug in a cache.  This is all a 
bit complicated and the query parser is in charge of the choice.  No parsers 
have an option for this hypothetical choice yet.

Or maybe we just accept that sometimes, queries can be big, and thus Lucene 
users like Solr just have to deal with it.  If a Query is beyond the size of 
some threshold, maybe we don't cache it by default unless the user explicitly 
chooses to.  That's a nice generic solution.  The QueryResultKey could cache by 
the original string input instead of the Query object, thus it wouldn't be 
affected.  This would short-circuit query parsing and be a performance benefit 
as well?

> FuzzyQuery has severe memory usage in 8.5
> -----------------------------------------
>
>                 Key: SOLR-14428
>                 URL: https://issues.apache.org/jira/browse/SOLR-14428
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>    Affects Versions: 8.5, 8.5.1
>            Reporter: Colvin Cowie
>            Assignee: Andrzej Bialecki
>            Priority: Major
>         Attachments: FuzzyHammer.java, SOLR-14428-WeakReferences.patch, 
> image-2020-04-23-09-18-06-070.png, image-2020-04-24-20-09-31-179.png, 
> screenshot-2.png, screenshot-3.png, screenshot-4.png
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I sent this to the mailing list
> I'm moving from 8.3.1 to 8.5.1, and started getting Out Of Memory Errors 
> while running our normal tests. After profiling it was clear that the 
> majority of the heap was allocated through FuzzyQuery.
> LUCENE-9068 moved construction of the automata from the FuzzyTermsEnum to the 
> FuzzyQuery's constructor.
> I created a little test ( [^FuzzyHammer.java] ) that fires off fuzzy queries 
> from random UUID strings for 5 minutes
> {code}
> FIELD_NAME + ":" + UUID.randomUUID().toString().replace("-", "") + "~2"
> {code}
> When running against a vanilla Solr 8.31 and 8.4.1 there is no problem, while 
> the memory usage has increased drastically on 8.5.0 and 8.5.1.
> Comparison of heap usage while running the attached test against Solr 8.3.1 
> and 8.5.1 with a single (empty) shard and 4GB heap:
> !image-2020-04-23-09-18-06-070.png! 
> And with 4 shards on 8.4.1 and 8.5.0:
>  !screenshot-2.png! 
> I'm guessing that the memory might be being leaked if the FuzzyQuery objects 
> are referenced from the cache, while the FuzzyTermsEnum would not have been.
> Query Result Cache on 8.5.1:
>  !screenshot-3.png! 
> ~316mb in the cache
> QRC on 8.3.1
>  !screenshot-4.png! 
> <1mb
> With an empty cache, running this query 
> _field_s:e41848af85d24ac197c71db6888e17bc~2_ results in the following memory 
> allocation
> {noformat}
> 8.3.1: CACHE.searcher.queryResultCache.ramBytesUsed:      1520
> 8.5.1: CACHE.searcher.queryResultCache.ramBytesUsed:    648855
> {noformat}
> ~1 gives 98253 and ~0 gives 6339 on 8.5.1. 8.3.1 is constant at 1520



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to