Re: DocValue on Strings slow and OOM

Per Steffensen Sat, 16 Nov 2013 05:29:21 -0800

Please note, for now, that this problem is not relevant for us anymore,and we will change our c-field from being of type string (docValue) tobeing of type long (docValue). And faceting on huge numbers of longdocValues seem to perform very well - except forhttps://issues.apache.org/jira/browse/SOLR-5444, but we have handledthat now

I would like to help verifying that the string-faceting problem thatthis mailing-thread has been about, that it has been fixed in 4.5.1 -that things are performing better and no huge mem usage. In order to beable to do that I would really like to be able to deploy 4.5.1 on top ofmy 12 billion documents indexed with 4.4.0. Can anyone confirm that Iought to be able to do that? I have tried shortly but ran into problems.When trying to start Solr it says


[2013-11-08 17:45:48,829]ERROR [coreLoadExecutor-4-thread-19] [logid: ] - 
org.apache.solr.common.SolrException.log(SolrException.java:119) 
-null:org.apache.solr.common.SolrException: Unable to create core: 
mycoll_shard13_replica1
        at 
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:934)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:566)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
        at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
        at java.util.concurrent.FutureTask.run(FutureTask.java:138)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error openingnew  searcher
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625)
        at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555)
        ... 10 more
Caused by: org.apache.solr.common.SolrException: Error openingnew  searcher
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477)
        at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589)
        at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
        ... 13 more
Caused by: org.apache.lucene.index.CorruptIndexException: Unknown format: 12, 
input=MMapIndexInput(path="/usr/lib/solr/data/mycoll_shard13_replica1/data/index/_1k63_Disk_0.dvdm")
        at 
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.readNumericEntry(Lucene45DocValuesProducer.java:207)
        at 
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.readFields(Lucene45DocValuesProducer.java:120)
        at 
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.<init>(Lucene45DocValuesProducer.java:85)
        at 
org.apache.lucene.codecs.diskdv.DiskDocValuesProducer.<init>(DiskDocValuesProducer.java:31)
        at 
org.apache.lucene.codecs.diskdv.DiskDocValuesFormat.fieldsProducer(DiskDocValuesFormat.java:56)
        at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.<init>(PerFieldDocValuesFormat.java:215)
        at 
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:300)
        at 
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:140)
        at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:56)
        at 
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
        at 
org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(ReadersAndLiveDocs.java:217)
        at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
        at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:379)
        at 
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
        at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:41)
        at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1443)
        ... 15 more

Besides that, see comments below

On 11/14/13 7:54 PM, Joel Bernstein wrote:

Per,
As you are seeing there are different implementations for calculatingfacets for numeric fields and string fields. The numeric fields Ibelieve are using an int-to-int or long-to-int hashmap to hold thefacet counts. This map grows as values are added to it. The Stringversion uses an int array the size of the number of distinct values inthe field to hold the facet counts. So if you have a very large numberof distinct values in the field, you'll have a very large array.

Do not think this part is a problem

Also the distinct values themselves are held in memory in thefieldCache for string fields.

Yes, that is probably a problem

Also notehttps://dl.dropboxusercontent.com/u/25718039/mem-dump-while-searching-on-facet.field-c_dstr_doc_sto.pngand my comments on it in a mail earlier in this thread.

So, basically as you are seeing you'll take up a much larger memoryfootprint when when faceting on a high cardinality string field, thenon a high cardinality numeric field.
There are docvalues faceting implementations that will kick-in on afield that has docvalues. You can try setting the on disk flag

Believe I did that for my string field "c_dstr_doc_sto"?
From schema.xml

<dynamicField name="**_dstr_doc_sto*" type="*dstring*"indexed="false" stored="true" required="true" docValues="true"/><dynamicField name="*_lng_ind_sto" type="long" indexed="true"stored="true"/><dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false"stored="true" required="true" docValues="true"/>

...

<fieldType name="*dstring*" class="solr.StrField"sortMissingLast="true" *docValuesFormat="Disk"*/><fieldType name="dlng" class="solr.TrieLongField" precisionStep="0"positionIncrementGap="0" docValuesFormat="Disk"/>


Did I miss something?

and this will test memory and performance.

Joel

Joel

On Thu, Nov 14, 2013 at 8:13 AM, Per Steffensen <[email protected]<mailto:[email protected]>> wrote:


    If anyone if following this one, just an update. We are not going
    to upgrade to 4.5.1 in order to see if the String facet
    performance problem has been fixed. Instead we have made a few
    hacks around our data so that we can store the c-field
    (c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So now we only
    need to struggle with long-facet performance. There is a
    performance issue with facets on longs though, but I will tell
    about in another mailing-thread - need your input on what solution
    you prefer.

https://issues.apache.org/jira/browse/SOLR-5444



    Regards, Per Steffensen

Re: DocValue on Strings slow and OOM

Reply via email to