RE: HITCOLLECTOR+SCORE+DELIMMA

Karthik N S Mon, 13 Dec 2004 12:29:37 -0800

Hi Erik

What exactly do u mean by this

>We've emphasized numerous times that calling hits.doc(i) is a resource
>hit.  Don't do it for documents you aren't going to show.  To filter by
>score, use hits.score(i) first.

 I am bit Confused u mean to say Replace

   hits.doc(i)

    by

  hits.score(i)

Also

> Ah, so you are accessing every document to get this field information.
> It is incorrect that you cannot filter prior to getting hits.  You have
> a couple of options in filtering by a field value - use a QueryFilter
. or simply AND a RangeQuery to the original query.

Since the portal we ar building for is a eCommerce one, We have to return
SearchWord across

  ( >7 ) x 1000 x  15000  documents , Get most of the Relevant His (Where
ever Score is between 0.5 to 1.0 )

  and then Sort the adjecent Fields 'Vendors' and 'Price' in ASC Order

 In such a case We cannot use RangeQuery.... without priorly knowing what
exactly the Consumer want's

 Is it not possible to have a Generalized Filter in further versions of API
, to Inject some minor factors prior to

 getting the Hits returned.

Thx in advance
Karthik

-----Original Message-----
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 14, 2004 3:44 PM
To: Lucene Users List
Subject: Re: HITCOLLECTOR+SCORE+DELIMMA

On Dec 13, 2004, at 11:16 PM, Karthik N S wrote:
>  time [ A simple search of 'handbags' returned 1,60,000 hits and time
> taken
> was 440 secs ,in production Env  / May be our
>  Coding is poor,But we are constantly improving the process ].

If your searches are taking 440 seconds, you have something more
fundamentally wrong.  You are either doing some large
wildcard/range/fuzzy expansions or you're accessing every document from
all your hits.  Is the searcher.search() method taking that long?  I
bet not.  Or rather is it the iteration over the Hits that is killing
the "search" time, which is what I suspect?

We've emphasized numerous times that calling hits.doc(i) is a resource
hit.  Don't do it for documents you aren't going to show.  To filter by
score, use hits.score(i) first.

>  { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and
> BlackDawn Java 1.4.2 with Args  -XX:+UseParallelGC for
>
>  Garbage Collection  }

Please narrow your code down to a clean, succinct example that you can
post.  It is difficult to help you without details of your code (but
let me emphasize again - it needs to be clean and succinct so it is
quick for us to get a handle on).

>  To be One step in advance ,We also have an adjecent Fields 'Vendor
> ','Price' which we have to accordingly Compare
>  Best/Poor/Least results . So We have to have to limit the hits
> accordingly,since Lucene API does not provide any way to
>  inject this limiting facility *prior* to getting the hits .

Ah, so you are accessing every document to get this field information.
It is incorrect that you cannot filter prior to getting hits.  You have
a couple of options in filtering by a field value - use a QueryFilter
or simply AND a RangeQuery to the original query.

        Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: HITCOLLECTOR+SCORE+DELIMMA

Reply via email to