[
https://issues.apache.org/jira/browse/LUCENE-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13914799#comment-13914799
]
Robert Muir commented on LUCENE-5468:
-------------------------------------
I am finished compressing for now. I think its pretty reasonable across all the
languages.
I will cleanup and try to add back the multiple dictionary/ignore case stuff
and clean up some other things.
||dict||old RAM||new RAM||
|af_ZA.zip|18 MB|917.1 KB|
|ak_GH.zip|1.5 MB|103.2 KB|
|bg_BG.zip|FAIL|465.7 KB|
|ca_ANY.zip|28.9 MB|675.4 KB|
|ca_ES.zip|15.1 MB|639.8 KB|
|cop_EG.zip|2.1 MB|144.5 KB|
|cs_CZ.zip|50.4 MB|1.5 MB|
|cy_GB.zip|FAIL|627.4 KB|
|da_DK.zip|FAIL|669.8 KB|
|de_AT.zip|1.3 MB|123.9 KB|
|de_CH.zip|12.6 MB|725.4 KB|
|de_DE.zip|12.6 MB|726 KB|
|de_DE_comb.zip|102.2 MB|4.2 MB|
|de_DE_frami.zip|20.9 MB|1023.5 KB|
|de_DE_neu.zip|101.5 MB|4.2 MB|
|el_GR.zip|74.3 MB|1 MB|
|en_AU.zip|8.1 MB|521 KB|
|en_CA.zip|9.8 MB|450.5 KB|
|en_GB-oed.zip|8.2 MB|526.6 KB|
|en_GB.zip|8.3 MB|527.3 KB|
|en_NZ.zip|8.4 MB|532.4 KB|
|eo.zip|4.9 MB|310.5 KB|
|eo_EO.zip|4.9 MB|310.5 KB|
|es_AR.zip|14.8 MB|734.9 KB|
|es_BO.zip|14.8 MB|735 KB|
|es_CL.zip|14.7 MB|734.9 KB|
|es_CO.zip|14.3 MB|722.1 KB|
|es_CR.zip|14.8 MB|733.9 KB|
|es_CU.zip|14.7 MB|732.8 KB|
|es_DO.zip|14.7 MB|731.9 KB|
|es_EC.zip|14.8 MB|733.5 KB|
|es_ES.zip|15.1 MB|743 KB|
|es_GT.zip|14.8 MB|734.5 KB|
|es_HN.zip|14.8 MB|735.2 KB|
|es_MX.zip|14.3 MB|723.8 KB|
|es_NEW.zip|15.5 MB|768.5 KB|
|es_NI.zip|14.8 MB|734.5 KB|
|es_PA.zip|14.8 MB|733.8 KB|
|es_PE.zip|14.2 MB|721.3 KB|
|es_PR.zip|14.7 MB|732.4 KB|
|es_PY.zip|14.8 MB|734.1 KB|
|es_SV.zip|14.8 MB|733.6 KB|
|es_UY.zip|14.8 MB|736.9 KB|
|es_VE.zip|14.3 MB|722.7 KB|
|et_EE.zip|53.6 MB|473.6 KB|
|fo_FO.zip|18.6 MB|517.9 KB|
|fr_FR-1990_1-3-2.zip|14 MB|526.7 KB|
|fr_FR-classique_1-3-2.zip|14 MB|539.2 KB|
|fr_FR_1-3-2.zip|14.5 MB|550.4 KB|
|fy_NL.zip|4.2 MB|265.6 KB|
|ga_IE.zip|14 MB|460.6 KB|
|gd_GB.zip|2.7 MB|143.1 KB|
|gl_ES.zip|FAIL|479.4 KB|
|gsc_FR.zip|FAIL|1.3 MB|
|gu_IN.zip|20.3 MB|947 KB|
|he_IL.zip|53.3 MB|539.2 KB|
|hi_IN.zip|2.7 MB|169 KB|
|hil_PH.zip|3.4 MB|197 KB|
|hr_HR.zip|29.7 MB|573 KB|
|hu_HU.zip|FAIL|1.2 MB|
|hu_HU_comb.zip|FAIL|5.4 MB|
|ia.zip|4.9 MB|222.9 KB|
|id_ID.zip|3.9 MB|226.3 KB|
|it_IT.zip|15.3 MB|612.9 KB|
|ku_TR.zip|1.6 MB|118.7 KB|
|la.zip|5.1 MB|199.3 KB|
|lt_LT.zip|15 MB|682.5 KB|
|lv_LV.zip|36.3 MB|763.9 KB|
|mg_MG.zip|2.9 MB|163.8 KB|
|mi_NZ.zip|FAIL|191.4 KB|
|mk_MK.zip|FAIL|469.1 KB|
|mos_BF.zip|13.3 MB|242.2 KB|
|mr_IN.zip|FAIL|147.7 KB|
|ms_MY.zip|4.1 MB|226.9 KB|
|nb_NO.zip|22.9 MB|1.2 MB|
|ne_NP.zip|5.5 MB|328.1 KB|
|nl_NL.zip|22.9 MB|1.1 MB|
|nl_med.zip|1.2 MB|92.3 KB|
|nn_NO.zip|16.5 MB|914 KB|
|nr_ZA.zip|3.1 MB|203.3 KB|
|ns_ZA.zip|1.7 MB|118 KB|
|ny_MW.zip|FAIL|101.8 KB|
|oc_FR.zip|9.1 MB|401.5 KB|
|pl_PL.zip|43.9 MB|1.7 MB|
|pt_BR.zip|FAIL|2.1 MB|
|pt_PT.zip|5.8 MB|379.4 KB|
|ro_RO.zip|5.1 MB|256.3 KB|
|ru_RU.zip|21.7 MB|882 KB|
|ru_RU_ye.zip|43.7 MB|1.5 MB|
|ru_RU_yo.zip|21.7 MB|897.3 KB|
|rw_RW.zip|1.6 MB|102.3 KB|
|sk_SK.zip|25.1 MB|1.2 MB|
|sl_SI.zip|38.3 MB|604 KB||af_ZA.zip|18 MB|917.1 KB|
|ak_GH.zip|1.5 MB|103.2 KB|
|bg_BG.zip|FAIL|465.7 KB|
|ca_ANY.zip|28.9 MB|675.4 KB|
|ca_ES.zip|15.1 MB|639.8 KB|
|cop_EG.zip|2.1 MB|144.5 KB|
|cs_CZ.zip|50.4 MB|1.5 MB|
|cy_GB.zip|FAIL|627.4 KB|
|da_DK.zip|FAIL|669.8 KB|
|de_AT.zip|1.3 MB|123.9 KB|
|de_CH.zip|12.6 MB|725.4 KB|
|de_DE.zip|12.6 MB|726 KB|
|de_DE_comb.zip|102.2 MB|4.2 MB|
|de_DE_frami.zip|20.9 MB|1023.5 KB|
|de_DE_neu.zip|101.5 MB|4.2 MB|
|el_GR.zip|74.3 MB|1 MB|
|en_AU.zip|8.1 MB|521 KB|
|en_CA.zip|9.8 MB|450.5 KB|
|en_GB-oed.zip|8.2 MB|526.6 KB|
|en_GB.zip|8.3 MB|527.3 KB|
|en_NZ.zip|8.4 MB|532.4 KB|
|eo.zip|4.9 MB|310.5 KB|
|eo_EO.zip|4.9 MB|310.5 KB|
|es_AR.zip|14.8 MB|734.9 KB|
|es_BO.zip|14.8 MB|735 KB|
|es_CL.zip|14.7 MB|734.9 KB|
|es_CO.zip|14.3 MB|722.1 KB|
|es_CR.zip|14.8 MB|733.9 KB|
|es_CU.zip|14.7 MB|732.8 KB|
|es_DO.zip|14.7 MB|731.9 KB|
|es_EC.zip|14.8 MB|733.5 KB|
|es_ES.zip|15.1 MB|743 KB|
|es_GT.zip|14.8 MB|734.5 KB|
|es_HN.zip|14.8 MB|735.2 KB|
|es_MX.zip|14.3 MB|723.8 KB|
|es_NEW.zip|15.5 MB|768.5 KB|
|es_NI.zip|14.8 MB|734.5 KB|
|es_PA.zip|14.8 MB|733.8 KB|
|es_PE.zip|14.2 MB|721.3 KB|
|es_PR.zip|14.7 MB|732.4 KB|
|es_PY.zip|14.8 MB|734.1 KB|
|es_SV.zip|14.8 MB|733.6 KB|
|es_UY.zip|14.8 MB|736.9 KB|
|es_VE.zip|14.3 MB|722.7 KB|
|et_EE.zip|53.6 MB|473.6 KB|
|fo_FO.zip|18.6 MB|517.9 KB|
|fr_FR-1990_1-3-2.zip|14 MB|526.7 KB|
|fr_FR-classique_1-3-2.zip|14 MB|539.2 KB|
|fr_FR_1-3-2.zip|14.5 MB|550.4 KB|
|fy_NL.zip|4.2 MB|265.6 KB|
|ga_IE.zip|14 MB|460.6 KB|
|gd_GB.zip|2.7 MB|143.1 KB|
|gl_ES.zip|FAIL|479.4 KB|
|gsc_FR.zip|FAIL|1.3 MB|
|gu_IN.zip|20.3 MB|947 KB|
|he_IL.zip|53.3 MB|539.2 KB|
|hi_IN.zip|2.7 MB|169 KB|
|hil_PH.zip|3.4 MB|197 KB|
|hr_HR.zip|29.7 MB|573 KB|
|hu_HU.zip|FAIL|1.2 MB|
|hu_HU_comb.zip|FAIL|5.4 MB|
|ia.zip|4.9 MB|222.9 KB|
|id_ID.zip|3.9 MB|226.3 KB|
|it_IT.zip|15.3 MB|612.9 KB|
|ku_TR.zip|1.6 MB|118.7 KB|
|la.zip|5.1 MB|199.3 KB|
|lt_LT.zip|15 MB|682.5 KB|
|lv_LV.zip|36.3 MB|763.9 KB|
|mg_MG.zip|2.9 MB|163.8 KB|
|mi_NZ.zip|FAIL|191.4 KB|
|mk_MK.zip|FAIL|469.1 KB|
|mos_BF.zip|13.3 MB|242.2 KB|
|mr_IN.zip|FAIL|147.7 KB|
|ms_MY.zip|4.1 MB|226.9 KB|
|nb_NO.zip|22.9 MB|1.2 MB|
|ne_NP.zip|5.5 MB|328.1 KB|
|nl_NL.zip|22.9 MB|1.1 MB|
|nl_med.zip|1.2 MB|92.3 KB|
|nn_NO.zip|16.5 MB|914 KB|
|nr_ZA.zip|3.1 MB|203.3 KB|
|ns_ZA.zip|1.7 MB|118 KB|
|ny_MW.zip|FAIL|101.8 KB|
|oc_FR.zip|9.1 MB|401.5 KB|
|pl_PL.zip|43.9 MB|1.7 MB|
|pt_BR.zip|FAIL|2.1 MB|
|pt_PT.zip|5.8 MB|379.4 KB|
|ro_RO.zip|5.1 MB|256.3 KB|
|ru_RU.zip|21.7 MB|882 KB|
|ru_RU_ye.zip|43.7 MB|1.5 MB|
|ru_RU_yo.zip|21.7 MB|897.3 KB|
|rw_RW.zip|1.6 MB|102.3 KB|
|sk_SK.zip|25.1 MB|1.2 MB|
|sl_SI.zip|38.3 MB|604 KB|
|sq_AL.zip|28.9 MB|581.7 KB|
|ss_ZA.zip|3.1 MB|208.5 KB|
|st_ZA.zip|1.7 MB|118.7 KB|
|sv_SE.zip|9.5 MB|535.4 KB|
|sw_KE.zip|6.3 MB|318.2 KB|
|tet_ID.zip|2 MB|124.5 KB|
|th_TH.zip|FAIL|409.6 KB|
|tl_PH.zip|2.6 MB|148.7 KB|
|tn_ZA.zip|1.5 MB|93.7 KB|
|ts_ZA.zip|1.6 MB|113.1 KB|
|uk_UA.zip|17.6 MB|979.1 KB|
|ve_ZA.zip|FAIL|140.9 KB|
|vi_VN.zip|1.7 MB|85.8 KB|
|xh_ZA.zip|3 MB|191.1 KB|
|zu_ZA.zip|24.5 MB|827.1 KB|
|sq_AL.zip|28.9 MB|581.7 KB|
|ss_ZA.zip|3.1 MB|208.5 KB|
|st_ZA.zip|1.7 MB|118.7 KB|
|sv_SE.zip|9.5 MB|535.4 KB|
|sw_KE.zip|6.3 MB|318.2 KB|
|tet_ID.zip|2 MB|124.5 KB|
|th_TH.zip|FAIL|409.6 KB|
|tl_PH.zip|2.6 MB|148.7 KB|
|tn_ZA.zip|1.5 MB|93.7 KB|
|ts_ZA.zip|1.6 MB|113.1 KB|
|uk_UA.zip|17.6 MB|979.1 KB|
|ve_ZA.zip|FAIL|140.9 KB|
|vi_VN.zip|1.7 MB|85.8 KB|
|xh_ZA.zip|3 MB|191.1 KB|
|zu_ZA.zip|24.5 MB|827.1 KB|
> Hunspell very high memory use when loading dictionary
> -----------------------------------------------------
>
> Key: LUCENE-5468
> URL: https://issues.apache.org/jira/browse/LUCENE-5468
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 3.5
> Reporter: Maciej Lisiewski
> Priority: Minor
> Attachments: patch.txt
>
>
> Hunspell stemmer requires gigantic (for the task) amounts of memory to load
> dictionary/rules files.
> For example loading a 4.5 MB polish dictionary (with empty index!) will cause
> whole core to crash with various out of memory errors unless you set max heap
> size close to 2GB or more.
> By comparison Stempel using the same dictionary file works just fine with 1/8
> of that (and possibly lower values as well).
> Sample error log entries:
> http://pastebin.com/fSrdd5W1
> http://pastebin.com/Lmi0re7Z
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]