[
https://issues.apache.org/jira/browse/SOLR-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13169394#comment-13169394
]
Robert Muir commented on SOLR-2968:
-----------------------------------
{quote}
Marcin MiĆkowski has a set of scripts for that and he, as far as I recall, used
aspell/ ispell to "dump" all of their forms by feeding the input dictionary
basically. I think hunspell provides more intelligent handling of words outside
of the dictionary so there's value in it that morfologik doesn't have.
{quote}
I think what you describe is essentially at a highlevel exactly what the
hunspellfilter does. Theoretically there is more intelligent handling possible
(correcting spelling), but this isn't implemented, not interesting for search
anyway for the most part, and there is definitely no OOV mechanism.
> Hunspell very high memory use when loading dictionary
> -----------------------------------------------------
>
> Key: SOLR-2968
> URL: https://issues.apache.org/jira/browse/SOLR-2968
> Project: Solr
> Issue Type: Bug
> Affects Versions: 3.5
> Reporter: Maciej Lisiewski
> Priority: Minor
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load
> dictionary/rules files.
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause
> whole core to crash with various out of memory errors unless you set max heap
> size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8
> of that (and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]