Hi!

I'd like to respond on this point:

> 5. Can someone imagine situation when more than one Analyzers are used in
an application?

 Not only can I imagine such a situation, but I'd also strongly recommand it
for any high-quality application! If you are just targetting speed and light
cpu usage, sure, one single analyzer is enough. But your application will
get the precision/recall it deserves. A nice search engine should be
flexible enough to use several analyzers, and combine their result to
retrieve the best possible recall/precision. For example, say you are
looking for something related to "selling toothbrushes". The application
should retrieve all the occurrences matching exactly "selling toothbrushes"
(using a strict analyzer), but it may also retrieve "sell toothbrush" (using
a stemming normalizer). Why not retrieving "buy toothbrush" or "sell dental
tools" as well (kind of semantic normalizer/analyzer). One could also
imagine retrieving "Selin Toothbrushies" (phonetic normalizer).

 Ok, so this increases the precision, but unfortunately increases
drastically the recall, right ? wrong : all this analyzers should be
ordered, and the final result should be a calculation using the results of
all those indexes. For instance, the results of the strict-analyzer-index
should be heavier than stemming, which should be heavier than phonetic, etc.
The very simple reason is that the more aggressive is the normalization
process, the less likely /hazardous is it to be exactly what the user is
looking for. Sure, it's CPU intensive, but here is the dilemma of the search
engines : be fast or be smart. My belief is that lucene, as a search engine,
should allow both kind of application (and I personnaly prefer smart SE,
rather than fast ones).

Rodrigo



--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

Reply via email to