[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

James Dyer (JIRA) Fri, 03 Jun 2011 12:33:44 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043991#comment-13043991
 ]


James Dyer commented on SOLR-2462:
----------------------------------

Yeah, the I agree the time limit is a bit of a hack.  On the other hand, the 
list of possibilities it needs to evaluate can get really long really fast.  If 
you're returning 15 or 20 suggestions per word and the user misspells 10 or so 
words, you get a pretty big list of combinations (in our case users were 
pasting the URL in the search box generating a query with 12 "misspelled" 
words...)  Then again, this latest version is much faster than what I had put 
out there originally...

Maybe we can just put a hard limit on the number of possibilities it will 
evaluate?  It could be really high like a million or something.  We could make 
it a configurable parameter, something like 
"spellcheck.maxCollationPossibilitiesToEval" , but then again that seems silly. 
 Who would really change it if a million was the default ?

At the end of the day, I'd feel better where I am at if Solr had some kind of 
secondary fallback here.  One thing that really made me nervous about our 
previous search engine is it wasn't terribly hard to send a query over to it 
that would crash the thing or make it churn a long time just to return nothing. 
 So far my experience is that Solr is less prone to this kind of failure and 
I'd really like to keep it that way...

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, 
> SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a 
> ranked list of *every* possible correction combination.  But if returning 
> several corrections per term, and if several words are misspelled, the 
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime 
> "spellcheck.collate" is used.  It is not necessary to use any features that 
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking 
> our Solr servers down with "infinite" GC loops.  It was pretty easy for this 
> to happen as occasionally a user will accidently paste the URL into the 
> Search box on our app.  This URL results in a search with ~12 misspelled 
> words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

Reply via email to