Re: New facet module

Toke Eskildsen Mon, 11 Jul 2011 00:05:39 -0700

On Sat, 2011-07-09 at 05:44 +0200, Shai Erera wrote:
> The taxonomy is global to the index, but I think it will be
> interesting to explore per-segment taxonomy, and how it can be used to
> improve indexing or search perf (hopefully both).


I have struggled with this for some time and still haven't found a real
solution. Distributed faceting, with the special case segment based
faceting, is hard to do without a central taxonomy.

The new faceting module is explicit about the central taxonomy. My
experiments with https://issues.apache.org/jira/browse/LUCENE-2369
computes it at index open time. None of them work very well, if at all,
for a real distributed environment.

The problem is the same for flat faceting but is magnified with
hierarchical faceting: When the sorting order of facet elements is
popularity based, computing the correct counts for a top-X might
potentially involve comparison of the whole result from each part. 

A pathological case for flat faceting is
Part 1: A1(2), A2(2)... An(2)
Part 2: B1(3), B2(2), B3(2)... Bn(2), An(1)
where the correct top 3 answer is An(3), B1(3), A2(2), which requires
the full part results to get to the An(2) and An(1) as they are the last
elements.

For real world use, we can do clever counting so that we only return
what is necessary, but it does not change the worst case. To ensure that
we don't hit any million entries merge situations, we must cheat and
make a cutoff point.

With a multi-level faceting result (state/town/street expanded to top 5
elements on all levels) we must resolve quite a lot of elements to
ensure a high chance of getting the right elements with the right
counts. We can avoid this by drilling down one level at a time, but that
is just replacing bulk transfers with multiple requests: 1*5*5 is the
unrealistically low minimum for the address case.

- Toke


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: New facet module

Reply via email to