[
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043991#comment-13043991
]
James Dyer commented on SOLR-2462:
----------------------------------
Yeah, the I agree the time limit is a bit of a hack. On the other hand, the
list of possibilities it needs to evaluate can get really long really fast. If
you're returning 15 or 20 suggestions per word and the user misspells 10 or so
words, you get a pretty big list of combinations (in our case users were
pasting the URL in the search box generating a query with 12 "misspelled"
words...) Then again, this latest version is much faster than what I had put
out there originally...
Maybe we can just put a hard limit on the number of possibilities it will
evaluate? It could be really high like a million or something. We could make
it a configurable parameter, something like
"spellcheck.maxCollationPossibilitiesToEval" , but then again that seems silly.
Who would really change it if a million was the default ?
At the end of the day, I'd feel better where I am at if Solr had some kind of
secondary fallback here. One thing that really made me nervous about our
previous search engine is it wasn't terribly hard to send a query over to it
that would crash the thing or make it churn a long time just to return nothing.
So far my experience is that Solr is less prone to this kind of failure and
I'd really like to keep it that way...
> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
> Key: SOLR-2462
> URL: https://issues.apache.org/jira/browse/SOLR-2462
> Project: Solr
> Issue Type: Bug
> Components: spellchecker
> Affects Versions: 3.1
> Reporter: James Dyer
> Priority: Critical
> Fix For: 3.1.1, 4.0
>
> Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
> SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
> SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a
> ranked list of *every* possible correction combination. But if returning
> several corrections per term, and if several words are misspelled, the
> existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010. However, it is triggered anytime
> "spellcheck.collate" is used. It is not necessary to use any features that
> were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking
> our Solr servers down with "infinite" GC loops. It was pretty easy for this
> to happen as occasionally a user will accidently paste the URL into the
> Search box on our app. This URL results in a search with ~12 misspelled
> words. We have "spellcheck.count" set to 15.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]