Hi,

If the index size on disk is about 750 GiB then a memory usage of 2.3 G heap 
space for the FST seems fine. It's just a bit strange that you only have 10 
million documents!

Are those documents huge and have lots of indexed text content, possibly 
OCR/scanned stuff? If this is the case, the term dictionary may get huge 
because of many terms with incorrect spelling.

Please also give us a "ls -lh" of your index directory to make a guess.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: dawn breaks [mailto:2005dawnbre...@gmail.com]
> Sent: Thursday, January 11, 2018 3:40 AM
> To: java-user@lucene.apache.org
> Subject: Lucene OOM
> 
> Hi, all
>   We have a search engine service built with lucene 4.7,  it seem that
> lucene eat too much momery, and we have approximate 10 million
> document,the
> index size on disk is approximate 750G.  My question is why the FST$Arc
> objects consume so much memory?  please refer to the following histo stat
> of jmap. Hope anybody can give me some suggestion.
> 
>  num     #instances         #bytes  class name
> ----------------------------------------------
>    1:       4346283     2294837424  [Lorg.apache.lucene.util.fst.FST$Arc;
>    2:      25918804     2023475632  [C
>    3:      17450041     1014051416  [B
>    4:      25878734      621089616  java.lang.String
>    5:      18634803      596313696  java.util.HashMap$Node
>    6:      14039862      561594480  java.util.TreeMap$Entry
>    7:       4346283      452013432  org.apache.lucene.util.fst.FST
>    8:       4522836      424741520  [Ljava.util.HashMap$Node;
>    9:       4346283      347702640
> org.apache.lucene.codecs.BlockTreeTermsReader$FieldReader
>   10:       4683616      337220352  org.apache.lucene.util.fst.FST$Arc
>   11:      12947467      310739208  org.apache.lucene.util.BytesRef
>   12:        790283      280383040  [J
>   13:       4359111      245496264  [Ljava.lang.Object;
>   14:       4545337      218176176  java.util.HashMap
>   15:       4510384      216498432  org.apache.lucene.index.FieldInfo
>   16:       4359066      199713232  [I
>   17:       4346283      173851320  org.apache.lucene.util.fst.BytesStore
>   18:       4510400      144332800  java.util.Collections$UnmodifiableMap
>   19:       4354347      104504328  java.util.ArrayList
>   20:       5736589       91785424  java.lang.Integer
>   21:        822685       59233320
> org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$NumericE
> ntry
>   22:        428313       13706016
> org.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry
>   23:        420547       13457504  org.wltea.analyzer.dic.DictSegment
>   24:        177039        5665248  [Lorg.wltea.analyzer.dic.DictSegment;
>   25:            20        5112128
> [Lorg.apache.lucene.facet.taxonomy.writercache.CollisionMap$Entry;
>   26:         42454        2377424  org.apache.lucene.store.RAMInputStream
>   27:         50054        2002160  org.apache.lucene.util.packed.Packed64
>   28:         44036        1761440
> org.apache.lucene.util.packed.DirectPackedReader
>   29:         33013        1056416
> java.util.concurrent.ConcurrentHashMap$Node
>   30:         43957        1054968
> org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer$2
> 
> 
> 
> 
> Thanks & Best Regards!
> lubin


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to