Re: Weird time results doing wildcard queries

Yonik Seeley Thu, 08 Sep 2005 17:08:43 -0700

A HitCollector returns docs by the order they are found (in the index, not 
by relevance). Use a search method that returns TopDocs if you want the 
first n documents without executing the query more than once (Hits uses this 
internally).


-Yonik
Now hiring -- http://tinyurl.com/7m67g


On 9/8/05, Richard Krenek <[EMAIL PROTECTED]> wrote:
> 
> This answers a lot of questions and observations. We looked in the source
> code of the Hits object and found the getMoreDocs(int min) method which 
> does
> what you stated below.
> 
> We are assuming you meant for us to use a HitCollector instead. This 
> brings
> up a new question does the Searcher call the collect method in order that
> the docs are in the index or highest score first. We will look in LIA and 
> do
> some testing to find this answer, but if you know it, feel free to answer.
> 
> Thanks, this has helped our understanding of Lucene a lot.
> 
> On 9/8/05, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> >
> > The Hits class collects the document ids from the query in batches. If 
> you
> > iterate beyond what was collected, the query is re-executed to collect
> > more
> > ids.
> >
> > You can use the expert level search methods on IndexSearcher if this 
> isn't
> > what you want.
> >
> > -Yonik
> >
> > On 9/8/05, Richard Krenek <[EMAIL PROTECTED]> wrote:
> > >
> > > I understand that for the query, but why does it matter once you have
> > the
> > > Hits object? That is the part I'm baffled on. The query with the
> > wildcard
> > > in
> > > the front takes a lot longer, but we expected that.
> > >
> > > On 9/8/05, Jeremy Meyer <[EMAIL PROTECTED]> wrote:
> > > >
> > > > The issue isn't with multiple wildcards exactly. Specifically, the
> > > problem
> > > > is if the query starts with a wildcard. In the case where it starts
> > with
> > > a
> > > > wildcard, lucene has no option but to linearly go over every term in
> > the
> > > > index to see if it matches your pattern. It must visit every singe
> > term
> > > in
> > > > the index. If it doesn't start with a wildcard, lucene can skip to 
> the
> > > > relevant part of the index and only visit the relevant terms. For 
> this
> > > > reason, many people that use Lucene choose to disable having 
> wildcard
> > at
> > > the
> > > > start of a search term. This is discussed in the "Lucene in Action"
> > > book.
> > > >
> > > > ~Jack~
> > > >
> > > > >>Hello All,
> > > > >>I am getting some weird time results when retrieving documents 
> back
> > > from
> > > > a hits object. I am just timing this bit of code:
> > > > >>Hits hits = searcher.search(query);
> > > > >>long startTime = System.currentTimeMillis(); for (int i = 0; i <
> > > > hits.length(); i++) { Document doc = hits.doc(i); String field =
> > doc.get
> > > (defaultField);
> > > > } System.out.println("Cycle Time: "+(System.currentTimeMillis
> > > > ()-startTime));
> > > > >>
> > > > >>It seems when I have a wilcard query like *abcd* vs weqrew*, the
> > > *abcd*
> > > > query will always take longer to retrieve the documents even if they
> > are
> > > of
> > > > simular result sizes. We are talking a big difference 1 second vs 
> 16.
> > It
> > > is
> > > > consistent no matter >>what order I run the queries in, terms with
> > > multiple
> > > > wildcards always take longer to retrieve the documents. I am not
> > > counting
> > > > the time of the query.
> > > > >>
> > > > >>The index is 2.18 GB, 9 fields per document, 10,694,190 documents,
> > > > >>25,538,793 terms and has been optimized.
> > > > >>
> > > > >>I am not sure if this is a real or just a percieved issue. We 
> cannot
> > > > figure out why the type of query would affect the speed it takes to
> > > retrieve
> > > > each document. We have run this on both Windows XP and Linux. With 
> the
> > > same
> > > > results. Also to >>note we did watch GC and this did not have any
> > > > significant impact that we could se.
> > > > >>
> > > > >>We are trying to figure out what could cause this and how we can
> > work
> > > > around it.
> > > > >>
> > > > >>
> > > > >>Thanks,
> > > > >>Richard
> > > >
> > >
> > >
> >
> >
> > --
> > -Yonik
> > Now hiring -- http://tinyurl.com/7m67g
> >
> >
> 
>

Re: Weird time results doing wildcard queries

Reply via email to