Per,

As you are seeing there are different implementations for calculating
facets for numeric fields and string fields. The numeric fields I believe
are using an int-to-int or long-to-int hashmap to hold the facet counts.
This map grows as values are added to it. The String version uses an int
array the size of the number of distinct values in the field to hold the
facet counts. So if you have a very large number of distinct values in the
field, you'll have a very large array. Also the distinct values themselves
are held in memory in the fieldCache for string fields.

So, basically as you are seeing you'll take up a much larger memory
footprint when when faceting on a high cardinality string field, then on a
high cardinality numeric field.

There are docvalues faceting implementations that will kick-in on a field
that has docvalues. You can try setting the on disk flag and this will test
memory and performance.

Joel

Joel




On Thu, Nov 14, 2013 at 8:13 AM, Per Steffensen <st...@designware.dk> wrote:

>  If anyone if following this one, just an update. We are not going to
> upgrade to 4.5.1 in order to see if the String facet performance problem
> has been fixed. Instead we have made a few hacks around our data so that we
> can store the c-field (c_dstr_doc_sto) as long instead (c_dlng_doc_sto). So
> now we only need to struggle with long-facet performance. There is a
> performance issue with facets on longs though, but I will tell about in
> another mailing-thread - need your input on what solution you prefer.
>
> Regards, Per Steffensen
>
>
> On 11/6/13 12:15 PM, Per Steffensen wrote:
>
> On 11/6/13 11:43 AM, Robert Muir wrote:
>
> Before lucene 4.5 docvalues were loaded entirely into RAM.
>
> I'm not going to waste time debugging any old code releases here, you
> should upgrade to the latest release!
>
>  Ok, thanks!
>
> I do not consider it a bug (just a performance issue), so no debugging
> needed.
> It is just that we do not want to spend time upgrading to 4.5 if there is
> not a justified hope/explanation that it will probably make things
> better. But I guess there is.
>
> One short question: Will 4.5 index things differently (compared to 4.4)
> for documents with fields like I showed earlier? Im basically asking if we
> need to reindex the 12billion documents again after upgrading to 4.5, or if
> we ought to be able to deploy 4.5 on top of the already indexed documents.
>
> Regards, Per Steffensen
>
>
>


-- 
Joel Bernstein
Search Engineer at Heliosearch

Reply via email to