Activity report on

  *[JIRA] New Feature SKER4949 - Solr SearchCommand implementation*

  Scarab Link: http://sesat.no/scarab/issues/id/SKER4949
  Module: Sesat> Kernel


  Activity generated by Glenn-Erik Sandbakken ([EMAIL PROTECTED]) at 08/21/2008 
10:52

  *Reasons for the changes*


  *Comments*
  - By Glenn-Erik Sandbakken - 08/21/2008 10:52 ---
  "Some more regarding the default behavior:
Solr removes certain (stop) words.
If you search for "name:as" you are returned 0 hits.
This also affects phrase searches, for name:"sesam as",
documents with name=sesam is given the same score as those with "sesam as", 
also documents with name="sesame" etc. is given the same score.
(probably because its al concidered stemmed).
We probably don't want to stem the content in our lists, especially because we 
have much content that is not in english.

name:"air crash" returns only those with "air" followed by "crash" (because 
both words are indexed).
name:air crash
returns all documents with "air" OR "crash" (and documents with stemmed 
versions).
if you need both to be included, use:
name:air AND crash

=>To be able to do exac matches we need to change the schema.xml and reindex.

After the refeed yesterday there is "only" 6.905.780 documents in the index.
Some of the reason for this may be that documents containing stop words only 
are removed.
I have seen "much" content in our lists that are only a few strange characters 
(i.e. "²", "³", "m²", "m³" etc.)
I've also seen entries that are most likely considered the same by solr such as 
"meter" and "-meter" and "meter*", "meter-", "--meter" etc.

And don't forget the mention about character encoding in my prev. post.
When I search (and the solr admin) for words containing non a-z0-9 chars, they 
appear to be sent as iso to solr, and solr works in utf8.
I therefore have to urlencode the search.
You can quickly find the urlencoded version of strings and characters here:
http://www.blooberry.com/indexdot/html/topics/urlencoding.htm
(at the bottom)

Next we need to meet to determine how I should configure solr to work for our 
lists purpose.
Theres a variety of options to tweak, and we may not know that we want to use 
them before we actually know the options that solr gives.
"
_______________________________________________
Kernel-issues mailing list
[email protected]
http://sesat.no/mailman/listinfo/kernel-issues

Svar til