Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2009-03-06 Thread Tom Burton-West
Hi Norberto, After working a bit on trying to port the Nutch CommonGrams code, I ran into lots of dependencies on Nutch and Hadoop. Would it be possible to get more information on how you use shingles (or code)? Are you creating shingles for all two word combinations or using a list of words?

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Shalin Shekhar Mangar
Hi Tom, I don't think anybody has worked on adding this to Solr yet. Do you mind opening a jira issue? On Tue, Nov 25, 2008 at 12:01 AM, Burton-West, Tom [EMAIL PROTECTED]wrote: Hello all, We are having problems with extremely slow phrase queries when the phrase query contains a common

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Mon, 24 Nov 2008 13:31:39 -0500 Burton-West, Tom [EMAIL PROTECTED] wrote: The approach to this problem used by Nutch looks promising. Has anyone ported the Nutch CommonGrams filter to Solr? Construct n-grams for frequently occuring terms and phrases while indexing. Optimize phrase

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-25 Thread Norberto Meijome
On Wed, 26 Nov 2008 10:08:03 +1100 Norberto Meijome [EMAIL PROTECTED] wrote: We didn't notice any severe performance hit but : - data set isn't huge ( ca 1 MM docs). - reindexed nightly via DIH from MS-SQL, so we can use a separate cache layer to lower the number of hits to SOLR. To make

port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-24 Thread Burton-West, Tom
Hello all, We are having problems with extremely slow phrase queries when the phrase query contains a common words. We are reluctant to just use stop words due to various problems with false hits and some things becoming impossible to search with stop words turned on. (For example to be or not to

Re: port of Nutch CommonGrams to Solr for help with slow phrase queries

2008-11-24 Thread Walter Underwood
This technique was used at Infoseek in 1996, and is very effective. It also gives a relevance improvement, because you have an estimate of IDF for phrases (exact for two-word phrases). The terms the and who will be very common, but the who is quite rare and will have a big IDF. wunder On