Please note, for now, that this problem is not relevant for us anymore,
and we will change our c-field from being of type string (docValue) to
being of type long (docValue). And faceting on huge numbers of long
docValues seem to perform very well - except for
https://issues.apache.org/jira/browse/SOLR-5444, but we have handled
that now
I would like to help verifying that the string-faceting problem that
this mailing-thread has been about, that it has been fixed in 4.5.1 -
that things are performing better and no huge mem usage. In order to be
able to do that I would really like to be able to deploy 4.5.1 on top of
my 12 billion documents indexed with 4.4.0. Can anyone confirm that I
ought to be able to do that? I have tried shortly but ran into problems.
When trying to start Solr it says
[2013-11-08 17:45:48,829]ERROR [coreLoadExecutor-4-thread-19] [logid: ] -
org.apache.solr.common.SolrException.log(SolrException.java:119)
-null:org.apache.solr.common.SolrException: Unable to create core:
mycoll_shard13_replica1
at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:934)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:566)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:247)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:239)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: Error openingnew searcher
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:834)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:625)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:256)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:555)
... 10 more
Caused by: org.apache.solr.common.SolrException: Error openingnew searcher
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1477)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1589)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:821)
... 13 more
Caused by: org.apache.lucene.index.CorruptIndexException: Unknown format: 12,
input=MMapIndexInput(path="/usr/lib/solr/data/mycoll_shard13_replica1/data/index/_1k63_Disk_0.dvdm")
at
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.readNumericEntry(Lucene45DocValuesProducer.java:207)
at
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.readFields(Lucene45DocValuesProducer.java:120)
at
org.apache.lucene.codecs.lucene45.Lucene45DocValuesProducer.<init>(Lucene45DocValuesProducer.java:85)
at
org.apache.lucene.codecs.diskdv.DiskDocValuesProducer.<init>(DiskDocValuesProducer.java:31)
at
org.apache.lucene.codecs.diskdv.DiskDocValuesFormat.fieldsProducer(DiskDocValuesFormat.java:56)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsReader.<init>(PerFieldDocValuesFormat.java:215)
at
org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat.fieldsProducer(PerFieldDocValuesFormat.java:300)
at
org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:140)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:56)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:121)
at
org.apache.lucene.index.ReadersAndLiveDocs.getReadOnlyClone(ReadersAndLiveDocs.java:217)
at
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:100)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:379)
at
org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:111)
at
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:41)
at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1443)
... 15 more
Besides that, see comments below
On 11/14/13 7:54 PM, Joel Bernstein wrote:
Per,
As you are seeing there are different implementations for calculating
facets for numeric fields and string fields. The numeric fields I
believe are using an int-to-int or long-to-int hashmap to hold the
facet counts. This map grows as values are added to it. The String
version uses an int array the size of the number of distinct values in
the field to hold the facet counts. So if you have a very large number
of distinct values in the field, you'll have a very large array.
Do not think this part is a problem
Also the distinct values themselves are held in memory in the
fieldCache for string fields.
Yes, that is probably a problem
Also note
https://dl.dropboxusercontent.com/u/25718039/mem-dump-while-searching-on-facet.field-c_dstr_doc_sto.png
and my comments on it in a mail earlier in this thread.
So, basically as you are seeing you'll take up a much larger memory
footprint when when faceting on a high cardinality string field, then
on a high cardinality numeric field.
There are docvalues faceting implementations that will kick-in on a
field that has docvalues. You can try setting the on disk flag
Believe I did that for my string field "c_dstr_doc_sto"?
From schema.xml
<dynamicField name="**_dstr_doc_sto*" type="*dstring*"
indexed="false" stored="true" required="true" docValues="true"/>
<dynamicField name="*_lng_ind_sto" type="long" indexed="true"
stored="true"/>
<dynamicField name="*_dlng_doc_sto" type="dlng" indexed="false"
stored="true" required="true" docValues="true"/>
...
<fieldType name="*dstring*" class="solr.StrField"
sortMissingLast="true" *docValuesFormat="Disk"*/>
<fieldType name="dlng" class="solr.TrieLongField" precisionStep="0"
positionIncrementGap="0" docValuesFormat="Disk"/>
Did I miss something?
and this will test memory and performance.
Joel
Joel
On Thu, Nov 14, 2013 at 8:13 AM, Per Steffensen <[email protected]
<mailto:[email protected]>> wrote:
If anyone if following this one, just an update. We are not going
to upgrade to 4.5.1 in order to see if the String facet
performance problem has been fixed. Instead we have made a few
hacks around our data so that we can store the c-field
(c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So now we only
need to struggle with long-facet performance. There is a
performance issue with facets on longs though, but I will tell
about in another mailing-thread - need your input on what solution
you prefer.
https://issues.apache.org/jira/browse/SOLR-5444
Regards, Per Steffensen