[
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13910017#comment-13910017
]
Robert Muir commented on LUCENE-5468:
-------------------------------------
I brought the previous FST patch up to speed, and then built a test to parse
many dictionaries and compare memory. When it says FAIL, thats because the
current code can't parse the dictionary (i fixed all the issues here).
In general, RAM use is better, but in some cases its still bad because of how
the affixes are represented. I still havent removed my Treemap yet either (i
wanted to have a way to test all the dictionaries like this before really
locking things down).
||dict||old RAM||new RAM||
|af_ZA.zip|18 MB|899 KB|
|ak_GH.zip|1.5 MB|71 KB|
|bg_BG.zip|FAIL|1.1 MB|
|ca_ANY.zip|28.9 MB|1.2 MB|
|ca_ES.zip|15.1 MB|1.2 MB|
|cop_EG.zip|2.1 MB|489.3 KB|
|cs_CZ.zip|50.4 MB|2.8 MB|
|cy_GB.zip|FAIL|1.6 MB|
|da_DK.zip|FAIL|750.8 KB|
|de_AT.zip|1.3 MB|293.1 KB|
|de_CH.zip|12.6 MB|895.6 KB|
|de_DE.zip|12.6 MB|895 KB|
|de_DE_comb.zip|102.2 MB|4.8 MB|
|de_DE_frami.zip|20.9 MB|1.2 MB|
|de_DE_neu.zip|101.5 MB|4.8 MB|
|el_GR.zip|74.3 MB|1.1 MB|
|en_AU.zip|8.1 MB|1.2 MB|
|en_CA.zip|9.8 MB|436.7 KB|
|en_GB-oed.zip|8.2 MB|1.2 MB|
|en_GB.zip|8.3 MB|1.2 MB|
|en_NZ.zip|8.4 MB|1.2 MB|
|eo.zip|4.9 MB|1.3 MB|
|eo_EO.zip|4.9 MB|1.3 MB|
|es_AR.zip|14.8 MB|3.9 MB|
|es_BO.zip|14.8 MB|3.9 MB|
|es_CL.zip|14.7 MB|3.9 MB|
|es_CO.zip|14.3 MB|3.8 MB|
|es_CR.zip|14.8 MB|3.9 MB|
|es_CU.zip|14.7 MB|3.9 MB|
|es_DO.zip|14.7 MB|3.9 MB|
|es_EC.zip|14.8 MB|3.9 MB|
|es_ES.zip|15.1 MB|4.1 MB|
|es_GT.zip|14.8 MB|3.9 MB|
|es_HN.zip|14.8 MB|3.9 MB|
|es_MX.zip|14.3 MB|3.8 MB|
|es_NEW.zip|15.5 MB|4.2 MB|
|es_NI.zip|14.8 MB|3.9 MB|
|es_PA.zip|14.8 MB|3.9 MB|
|es_PE.zip|14.2 MB|3.8 MB|
|es_PR.zip|14.7 MB|3.9 MB|
|es_PY.zip|14.8 MB|3.9 MB|
|es_SV.zip|14.8 MB|3.9 MB|
|es_UY.zip|14.8 MB|3.9 MB|
|es_VE.zip|14.3 MB|3.8 MB|
|et_EE.zip|53.6 MB|5.9 MB|
|fo_FO.zip|18.6 MB|485.7 KB|
|fr_FR-1990_1-3-2.zip|14 MB|636.4 KB|
|fr_FR-classique_1-3-2.zip|14 MB|743.1 KB|
|fr_FR_1-3-2.zip|14.5 MB|755.2 KB|
|fy_NL.zip|4.2 MB|272.8 KB|
|ga_IE.zip|14 MB|674.8 KB|
|gd_GB.zip|2.7 MB|111 KB|
|gl_ES.zip|FAIL|1.2 MB|
|gsc_FR.zip|FAIL|1.4 MB|
|gu_IN.zip|20.3 MB|914.9 KB|
|he_IL.zip|53.3 MB|1.8 MB|
|hi_IN.zip|2.7 MB|136.9 KB|
|hil_PH.zip|3.4 MB|164.8 KB|
|hr_HR.zip|29.7 MB|564.8 KB|
|hu_HU.zip|FAIL|17.6 MB|
|hu_HU_comb.zip|FAIL|19.9 MB|
|ia.zip|4.9 MB|211.9 KB|
|id_ID.zip|3.9 MB|218.4 KB|
|it_IT.zip|15.3 MB|1.6 MB|
|ku_TR.zip|1.6 MB|147.6 KB|
|la.zip|5.1 MB|2.5 MB|
|lt_LT.zip|15 MB|2.8 MB|
|lv_LV.zip|36.3 MB|1.9 MB|
|mg_MG.zip|2.9 MB|131.7 KB|
|mi_NZ.zip|FAIL|171.2 KB|
|mk_MK.zip|FAIL|436.9 KB|
|mos_BF.zip|13.3 MB|210 KB|
|mr_IN.zip|FAIL|115.5 KB|
|ms_MY.zip|4.1 MB|221.6 KB|
|nb_NO.zip|22.9 MB|1.4 MB|
|ne_NP.zip|5.5 MB|495.6 KB|
|nl_NL.zip|22.9 MB|1.1 MB|
|nl_med.zip|1.2 MB|60.2 KB|
|nn_NO.zip|16.5 MB|1 MB|
|nr_ZA.zip|3.1 MB|171.1 KB|
|ns_ZA.zip|1.7 MB|85.8 KB|
|ny_MW.zip|FAIL|69.6 KB|
|oc_FR.zip|9.1 MB|690.5 KB|
|pl_PL.zip|43.9 MB|4.9 MB|
|pt_BR.zip|FAIL|3.9 MB|
|pt_PT.zip|5.8 MB|773.4 KB|
|ro_RO.zip|5.1 MB|226.2 KB|
|ru_RU.zip|21.7 MB|1.4 MB|
|ru_RU_ye.zip|43.7 MB|1.6 MB|
|ru_RU_yo.zip|21.7 MB|1.4 MB|
|rw_RW.zip|1.6 MB|70.1 KB|
|sk_SK.zip|25.1 MB|2.3 MB|
|sl_SI.zip|38.3 MB|806.6 KB|
|sq_AL.zip|28.9 MB|654.6 KB|
|ss_ZA.zip|3.1 MB|176.3 KB|
|st_ZA.zip|1.7 MB|86.5 KB|
|sv_SE.zip|9.5 MB|668.8 KB|
|sw_KE.zip|6.3 MB|286 KB|
|tet_ID.zip|2 MB|92.4 KB|
|th_TH.zip|FAIL|377.4 KB|
|tl_PH.zip|2.6 MB|116.5 KB|
|tn_ZA.zip|1.5 MB|61.6 KB|
|ts_ZA.zip|1.6 MB|81 KB|
|uk_UA.zip|17.6 MB|3 MB|
|ve_ZA.zip|FAIL|108.8 KB|
|vi_VN.zip|1.7 MB|53.6 KB|
|xh_ZA.zip|3 MB|158.9 KB|
|zu_ZA.zip|24.5 MB|13.5 MB|
> Hunspell very high memory use when loading dictionary
> -----------------------------------------------------
>
> Key: LUCENE-5468
> URL: https://issues.apache.org/jira/browse/LUCENE-5468
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 3.5
> Reporter: Maciej Lisiewski
> Priority: Minor
> Attachments: patch.txt
>
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load
> dictionary/rules files.
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause
> whole core to crash with various out of memory errors unless you set max heap
> size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8
> of that (and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]