Hi Norberto,
After working a bit on trying to port the Nutch CommonGrams code, I ran into
lots of dependencies on Nutch and Hadoop. Would it be possible to get more
information on how you use shingles (or code)? Are you creating shingles for
all two word combinations or using a list of words?
Hi Tom,
I don't think anybody has worked on adding this to Solr yet. Do you mind
opening a jira issue?
On Tue, Nov 25, 2008 at 12:01 AM, Burton-West, Tom [EMAIL PROTECTED]wrote:
Hello all,
We are having problems with extremely slow phrase queries when the
phrase query contains a common
On Mon, 24 Nov 2008 13:31:39 -0500
Burton-West, Tom [EMAIL PROTECTED] wrote:
The approach to this problem used by Nutch looks promising. Has anyone
ported the Nutch CommonGrams filter to Solr?
Construct n-grams for frequently occuring terms and phrases while
indexing. Optimize phrase
On Wed, 26 Nov 2008 10:08:03 +1100
Norberto Meijome [EMAIL PROTECTED] wrote:
We didn't notice any severe performance hit but :
- data set isn't huge ( ca 1 MM docs).
- reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer
to lower the number of hits to SOLR.
To make
Hello all,
We are having problems with extremely slow phrase queries when the
phrase query contains a common words. We are reluctant to just use stop
words due to various problems with false hits and some things becoming
impossible to search with stop words turned on. (For example to be or
not to
This technique was used at Infoseek in 1996, and is very effective.
It also gives a relevance improvement, because you have an estimate
of IDF for phrases (exact for two-word phrases). The terms the and
who will be very common, but the who is quite rare and will have
a big IDF.
wunder
On