Activity report on *[JIRA] New Feature SKER4952 - Solr TokenEvaluator*
Scarab Link: http://sesat.no/scarab/issues/id/SKER4952 Module: Sesat> Kernel Activity generated by Mick Semb Wever ([EMAIL PROTECTED]) at 08/21/2008 15:18 *Reasons for the changes* *Comments* - By Mick Semb Wever - 08/21/2008 15:18 --- "> Solr removes certain (stop) words. We do not want any stop words in the solr list collection. I'm unsure as to how solr arranges collections. Is this installation of solr solely dedicated to the token evaluation lists? > We probably don't want to stem the content in our lists Correct. > name:"air crash" returns only those with "air" followed by "crash" ... Solr default boolean operator is OR (unlike our FAST installation which use AND). For the token evaluation lists the OR is exactly what we want. So the query ==> name:michael name:semb name:wever returns hits on anything containing either of the names. What we do want though: - since _every_ result must be processed we want to minimise the results return. So reconfiguring for exact matches is important for token evaluation. But exact matches isn't good enough as a query for "michael wever" will not return the submatch of "michael". A solution is described in the thread http://www.gossamer-threads.com/lists/lucene/java-user/62954 . Can we use this Shingle filter? see http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html - since SEARCH-4923 & SKER4949 need this solr collection to work as a normal matching query server. Is it possible to configure and re-index solr to handle both? (Could you duplicate the name field calling it name_exact applying to just it exact matching or the shingle filter?) Regarding "Staveforslag i søkefeltet" and the latter point: if having exact and non-exacting matching both enabled is not possible, a poorer solution exists using exact matching with a wildcard suffix on every term. This would have the limitation of only matching single words. " - By Mick Semb Wever - 08/21/2008 15:19 --- "> Solr removes certain (stop) words. We do not want any stop words in the solr list collection. I'm unsure as to how solr arranges collections. Is this installation of solr solely dedicated to the token evaluation lists? > We probably don't want to stem the content in our lists Correct. > name:"air crash" returns only those with "air" followed by "crash" ... Solr default boolean operator is OR (unlike our FAST installation which use AND). For the token evaluation lists the OR is exactly what we want. So the query ==> name:michael name:semb name:wever returns hits on anything containing either of the names. What we do want though: - since _every_ result must be processed we want to minimise the results return. So reconfiguring for exact matches is important for token evaluation. But exact matches isn't good enough as a query for "michael wever" will not return the submatch of "michael". A solution is described in the thread http://www.gossamer-threads.com/lists/lucene/java-user/62954 . Can we use this Shingle filter? see http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/shingle/ShingleFilter.html - since SEARCH-4923 & SKER4949 need this solr collection to work as a normal matching query server. Is it possible to configure and re-index solr to handle both? (Could you duplicate the name field calling it name_exact applying to just it exact matching or the shingle filter?) Regarding "Staveforslag i søkefeltet" and the latter point: if having exact and non-exacting matching both enabled is not possible, a poorer solution exists using exact matching with a wildcard suffix on every term. This would have the limitation of only matching single words. "
_______________________________________________ Kernel-issues mailing list [email protected] http://sesat.no/mailman/listinfo/kernel-issues
