Hi folks,
I'm now finishing a SOLR project for one of my customers (replacing
Microsoft FAST server with SOLR) and got the permission to contribute
our improvements.
The most interesting thing is a "FrequentSearchTerm" component which
allows to analyze the user-supplied search queries in real-time
+) it keeps track of the last queries per core using a LIFO buffer (so
we have an upper limit of memory consumption)
+) per query entry we keep track of the number of invocations, the
average number of result document and the average execution time
+) we allow for custom searches across the frequent search terms using
the MVEL expression language (see http://mvel.codehaus.org)
++) find all queries which did not yield any results - 'meanHits==0'
++) find all "iPhone" queries - "searchTerm.contains("iphone) ||
searchTerm.contains("i-phone)''
++) find all long-running "iPhone" queries -
'(searchTerm.contains("iphone) || searchTerm.contains("i-phone)) &&
meanTime>50'
+) GUI : we have a JSP page which allows to access the frequent search terms
+) there is also an XML/CSV export we use to display the 50 most
frequently used search queries in real-time
We use this component
+) to get input for QA regarding frequently used search terms
+) to find strange queries, e.g. queries returning no or too many
result, e.g. caused by WordDelimeterFilter
+) to keep our management happy ... :-)
So the question is - is the community interested in such a contribution?
If yes than I need to spend some time to improve the code from
"industrial quality" to "open source quality" including documentation
... you know what I mean .... :-)
Thanks in advance,
Siegfried Goeschl
PS: Not sure if the name "Frequent Search Term Component" is perfectly
suitable as it was taken from FAST - suggestions welcome
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]