Activity report on

  *[JIRA] New Feature SKER4952 - Solr TokenEvaluator*

  Scarab Link: http://sesat.no/scarab/issues/id/SKER4952
  Module: Sesat> Kernel


  Activity generated by Mick Semb Wever ([EMAIL PROTECTED]) at 08/21/2008 15:18

  *Reasons for the changes*


  *Comments*
  - By Mick Semb Wever - 08/21/2008 15:18 ---
  "> Solr removes certain (stop) words.

We do not want any stop words in the solr list collection.

I'm unsure as to how solr arranges collections. Is this installation of solr 
solely dedicated to the token evaluation lists?


> We probably don't want to stem the content in our lists

Correct.


> name:"air crash" returns only those with "air" followed by "crash" ...

Solr default boolean operator is OR (unlike our FAST installation which use 
AND).
For the token evaluation lists the OR is exactly what we want.
So the query ==> name:michael name:semb name:wever
 returns hits on anything containing either of the names.

What we do want though:
 - since _every_ result must be processed we want to minimise the results 
return. So reconfiguring for exact matches is important for token evaluation. 
But exact matches isn't good enough as a query for "michael wever" will not 
return the submatch of "michael". A solution is described in the thread 
http://www.gossamer-threads.com/lists/lucene/java-user/62954 . 
   Can we use this Shingle filter? see 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
     

 - since SEARCH-4923 & SKER4949 need this solr collection to work as a normal 
matching query server. Is it possible to configure and re-index solr to handle 
both? (Could you duplicate the name field calling it name_exact applying to 
just it exact matching or the shingle filter?)

Regarding "Staveforslag i søkefeltet" and the latter point: if having exact and 
non-exacting matching both enabled is not possible, a poorer solution exists 
using exact matching with a wildcard suffix on every term. This would have the 
limitation of only matching single words.
"

  - By Mick Semb Wever - 08/21/2008 15:19 ---
  "> Solr removes certain (stop) words.

We do not want any stop words in the solr list collection.

I'm unsure as to how solr arranges collections. Is this installation of solr 
solely dedicated to the token evaluation lists?


> We probably don't want to stem the content in our lists

Correct.


> name:"air crash" returns only those with "air" followed by "crash" ...

Solr default boolean operator is OR (unlike our FAST installation which use 
AND).
For the token evaluation lists the OR is exactly what we want.
So the query ==> name:michael name:semb name:wever
 returns hits on anything containing either of the names.

What we do want though:
 - since _every_ result must be processed we want to minimise the results 
return. So reconfiguring for exact matches is important for token evaluation. 
But exact matches isn't good enough as a query for "michael wever" will not 
return the submatch of "michael". A solution is described in the thread 
http://www.gossamer-threads.com/lists/lucene/java-user/62954 . 
   Can we use this Shingle filter? see 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html
     

 - since SEARCH-4923 & SKER4949 need this solr collection to work as a normal 
matching query server. Is it possible to configure and re-index solr to handle 
both? (Could you duplicate the name field calling it name_exact applying to 
just it exact matching or the shingle filter?)

Regarding "Staveforslag i søkefeltet" and the latter point: if having exact and 
non-exacting matching both enabled is not possible, a poorer solution exists 
using exact matching with a wildcard suffix on every term. This would have the 
limitation of only matching single words.
"
_______________________________________________
Kernel-issues mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-issues

Svar til