Forget about the quoted comment a the bottom below. It is not true. Both
the fast/efficient and the slow/memory-consuming query follow the
getTermCounts-path.
But I have identified another place where they take different paths in
the code. In SimpleFacets.getTermCounts you will find the code below. I
have pointed out where the two queries go.
if (params.getFieldBool(field, GroupParams.GROUP_FACET, false)) {
counts = getGroupedCounts(searcher, docs, field, multiToken,
offset,limit, mincount, missing, sort, prefix);
} else {
assert method != null;
switch (method) {
case ENUM:
assert TrieField.getMainValuePrefix(ft) == null;
counts = getFacetTermEnumCounts(searcher, docs, field,
offset, limit, mincount,missing,sort,prefix);
break;
case FCS:
assert !multiToken;
if (ft.getNumericType() != null && !sf.multiValued()) {
*** ---> The fast/efficient query (facet.field=a_dlng_doc_sto) goes here
// force numeric faceting
if (prefix != null && !prefix.isEmpty()) {
throw new SolrException(ErrorCode.BAD_REQUEST,
FacetParams.FACET_PREFIX + " is not supported on numeric types");
}
counts = NumericFacets.getCounts(searcher, docs, field,
offset, limit, mincount, missing, sort);
} else {
PerSegmentSingleValuedFaceting ps = new
PerSegmentSingleValuedFaceting(searcher, docs, field, offset,limit,
mincount, missing, sort, prefix);
Executor executor = threads == 0 ? directExecutor :
facetExecutor;
ps.setNumThreads(threads);
counts = ps.getFacetCounts(executor);
}
break;
case FC:
if (sf.hasDocValues()) {
*** ---> The slow/memory-consuming query (facet.field=c_dstr_doc_sto)
goes here
counts = DocValuesFacets.getCounts(searcher, docs, field,
offset,limit, mincount, missing, sort, prefix);
} else if (multiToken || TrieField.getMainValuePrefix(ft) !=
null) {
UnInvertedField uif =
UnInvertedField.getUnInvertedField(field, searcher);
counts = uif.getCounts(searcher, docs, offset, limit,
mincount,missing,sort,prefix);
} else {
counts = getFieldCacheCounts(searcher, docs, field,
offset,limit, mincount, missing, sort, prefix);
}
break;
default:
throw new AssertionError();
}
}
I also believe I have found where the huge memory allocation is done.
Did a memory dump while the slow/memory-consuming c_dstr_doc_sto-query
was going on (penty of time to do that - 100+ secs). It seems that a lot
of memory is allocated under SlowCompositeReaderWrapper.cachedOrdMaps
which holds HashMaps containing MultiDocValues$OrdinalMaps as values,
and those MultiDocValues$OrdinalMaps have a field ordDeltas-array of
MonotonicAppendingLongBuffers ... bla bla ... containing Packed64
containing long-arrays.
See
https://dl.dropboxusercontent.com/u/25718039/mem-dump-while-searching-on-facet.field-c_dstr_doc_sto.png
SlowCompositeReaderWrapper and all this memory-allocation does not seem
to be part of the fast a_dlng_doc_sto-query.
Does this information provide any leads on how to fix
response-time/memory-consumption issue? Maybe it helps telling if going
to 4.5 will fix the issue?
Regards, Per Steffensen
On 11/5/13 1:47 PM, Per Steffensen wrote:
Looking at threaddumps
It seems like one of the major differences in what is done for
c_dstr_doc_sto vs a_dlng_doc_sto is in
SimpleFactes.getFacetFieldCounts, where c_dstr_doc_sto takes the
"getTermCounts"-path and a_dlng_doc_sto takes the
"getListedTermCounts"-path.
String termList = localParams == null ? null :
localParams.get(CommonParams.TERMS);
if (termList != null) {
res.add(key, getListedTermCounts(facetValue, termList));
} else {
res.add(key, getTermCounts(facetValue));
}
getTermCounts seems to do a lot more and to be a lot more complex than
getListedTermCounts