Hi Phil, The query you gave did work. Well, that proves StandardAnalyzer has a different way of tokenizing URLs.
Thanks, Prashant. On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan <phil...@gmail.com> wrote: > Hi Prashant, > > I agree with Shai, that using Luke and printing out what the Document > looks like before it goes into the index, are going to be your best > bet for debugging this problem. > > The problem you're having is that StandardAnalyzer does not break-up > the hostname into separate terms, as it has a special case for > hostnames and acronyms. > > This should work... > +title:"rahul dravid" +url:"en.wikipedia.org" > > Thanks, > Phil > > On Sun, Aug 2, 2009 at 10:14 AM, prashant > ullegaddi<prashullega...@gmail.com> wrote: > > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there > is > > a document relevant to this query as well. > > The following query and its results proves it: > > > > Enter query: > > Searching for: +title:"rahul dravid" +url:wiki > > 4 total matching documents > > trec-id: clueweb09-enwp02-13-14368, URL: > > http://en.wikipedia.org/wiki/Rahul_Dravid > > trec-id: clueweb09-enwp01-83-11378, URL: > > http://en.wikipedia.org/wiki/Rahul_S_Dravid > > trec-id: clueweb09-en0011-08-22737, URL: > > http://www.reference.com/browse/wiki/Rahul_Dravid > > trec-id: clueweb09-enwp01-69-13556, URL: > > http://en.wikipedia.org/wiki/Rahul_Sharad_Dravid > > Press (q)uit or enter number to jump to a page. > > > > But see following query: > > > > Enter query: > > +title:"rahul dravid" +url:"wikipedia" > > Searching for: +title:"rahul dravid" +url:wikipedia > > 0 total matching documents > > Press (q)uit or enter number to jump to a page. > > > > Isn't it weird? > > > > -- Prashant. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >