[KERNEL] Issue #SKER4952

Mick Semb Wever Thu, 21 Aug 2008 06:17:40 -0700

Activity report on

  *[JIRA] New Feature SKER4952 - Solr TokenEvaluator*

Scarab Link: http://sesat.no/scarab/issues/id/SKER4952
Module: Sesat> Kernel

Activity generated by Mick Semb Wever ([EMAIL PROTECTED]) at 08/21/2008 15:18

*Reasons for the changes*

*Comments*
- By Mick Semb Wever - 08/21/2008 15:18 ---
"> Solr removes certain (stop) words.

We do not want any stop words in the solr list collection.

I'm unsure as to how solr arranges collections. Is this installation of solr
solely dedicated to the token evaluation lists?

> We probably don't want to stem the content in our lists

Correct.

> name:"air crash" returns only those with "air" followed by "crash" ...

Solr default boolean operator is OR (unlike our FAST installation which use
AND).
For the token evaluation lists the OR is exactly what we want.
So the query ==> name:michael name:semb name:wever
returns hits on anything containing either of the names.

What we do want though:
- since _every_ result must be processed we want to minimise the results
return. So reconfiguring for exact matches is important for token evaluation.
But exact matches isn't good enough as a query for "michael wever" will not
return the submatch of "michael". A solution is described in the thread
http://www.gossamer-threads.com/lists/lucene/java-user/62954 .
Can we use this Shingle filter? see
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html

- since SEARCH-4923 & SKER4949 need this solr collection to work as a normal
matching query server. Is it possible to configure and re-index solr to handle
both? (Could you duplicate the name field calling it name_exact applying to
just it exact matching or the shingle filter?)

Regarding "Staveforslag i søkefeltet" and the latter point: if having exact and
non-exacting matching both enabled is not possible, a poorer solution exists
using exact matching with a wildcard suffix on every term. This would have the
limitation of only matching single words.
"

- By Mick Semb Wever - 08/21/2008 15:19 ---
"> Solr removes certain (stop) words.