Hi Phil,

The query you gave did work. Well, that proves StandardAnalyzer has a
different way
of tokenizing URLs.

Thanks,
Prashant.

On Sun, Aug 2, 2009 at 11:22 PM, Phil Whelan <phil...@gmail.com> wrote:

> Hi Prashant,
>
> I agree with Shai, that using Luke and printing out what the Document
> looks like before it goes into the index, are going to be your best
> bet for debugging this problem.
>
> The problem you're having is that StandardAnalyzer does not break-up
> the hostname into separate terms, as it has a special case for
> hostnames and acronyms.
>
> This should work...
> +title:"rahul dravid" +url:"en.wikipedia.org"
>
> Thanks,
> Phil
>
> On Sun, Aug 2, 2009 at 10:14 AM, prashant
> ullegaddi<prashullega...@gmail.com> wrote:
> > Yes, I'm sure that title:"Rahul Dravid" is extracted properly, and there
> is
> > a document relevant to this query as well.
> > The following query and its results proves it:
> >
> > Enter query:
> > Searching for: +title:"rahul dravid" +url:wiki
> > 4 total matching documents
> >   trec-id: clueweb09-enwp02-13-14368, URL:
> > http://en.wikipedia.org/wiki/Rahul_Dravid
> >   trec-id: clueweb09-enwp01-83-11378, URL:
> > http://en.wikipedia.org/wiki/Rahul_S_Dravid
> >   trec-id: clueweb09-en0011-08-22737, URL:
> > http://www.reference.com/browse/wiki/Rahul_Dravid
> >   trec-id: clueweb09-enwp01-69-13556, URL:
> > http://en.wikipedia.org/wiki/Rahul_Sharad_Dravid
> > Press (q)uit or enter number to jump to a page.
> >
> > But see following query:
> >
> > Enter query:
> > +title:"rahul dravid" +url:"wikipedia"
> > Searching for: +title:"rahul dravid" +url:wikipedia
> > 0 total matching documents
> > Press (q)uit or enter number to jump to a page.
> >
> > Isn't it weird?
> >
> > -- Prashant.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to