Re: Time of processing hits.doc()

Haroldo Nascimento Mon, 19 Nov 2007 08:09:56 -0800

German,

When You said:
 "I collect every facet's bitset ... "
  what is a facet ? Is there the each option of filter of your site ?
How you get the every facets ?






On Nov 19, 2007 1:05 PM, Haroldo Nascimento <[EMAIL PROTECTED]> wrote:
> German,
>
>  What I need is similar to the your site
> http://listados.deremate.com.ar/panaderia .
>  I have many results of search, but I show any result (for example:
> first 10 for first page) , but for create the options of filter of
> location I need read all results fof search. The problem of
> performance is when I have 30.000 results.
>
>  How you get the filter the "ubication" ? You need read all results ?
>
>  thanks
>
>
> On Nov 19, 2007 12:05 PM, Grant Ingersoll <[EMAIL PROTECTED]> wrote:
> > I think, based on your previous question, that you just need to use
> > the search() method that returns TopDocs, not the lower-level
> > HitCollector method.  From the TopDocs, you can then access the
> > ScoreDoc, which will give you info about the doc and the score.  See 
> > http://www.lucenebootcamp.com/LuceneBootCamp/training/src/test/java/com/lucenebootcamp/training/basic/TopDocsTest.java
> >  from my Lucene Boot Camp training class for a really simple example.
> >
> > -Grant
> >
> >
> > On Nov 19, 2007, at 9:12 AM, Haroldo Nascimento wrote:
> >
> > > Mark,
> > >
> > >  How I can get the information of Document. I think that is in the
> > > implementation do method abstract collect. How I can get it .
> > >
> > >  Below is the example of javadoc the Lucene.
> > >
> > > Searcher searcher = new IndexSearcher(indexReader);
> > >   final BitSet bits = new BitSet(indexReader.maxDoc());
> > >   searcher.search(query, new HitCollector() {
> > >       public void collect(int doc, float score) {
> > >         bits.set(doc);
> > >       }
> > >     });
> > >
> > > Thanks
> > >
> > >
> > > On Nov 18, 2007 8:09 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> > >> Hey Haroldo.
> > >>
> > >> First thing you need to do is *stop* using Hits in your searches.
> > >> Hits
> > >> is optimized for some pretty specific use cases and you will get
> > >> along
> > >> much better by using a HitCollector.
> > >>
> > >> Hits has three main functions:
> > >>
> > >> It caches documents, normalizes scores, and stores ids associated
> > >> with
> > >> scores (a HitDoc). If you attempt to retrieve a HitDoc past the first
> > >> 100 from Hits, a new search will be issued to grab double the
> > >> required
> > >> HitDocs needed to satisfy your HitDoc retrieval attempt. This will be
> > >> repeated everytime you ask for a HitDoc beyond the current cache
> > >> (which
> > >> began at 100). This means that if you need to get a HitDoc beyond
> > >> 100,
> > >> Hits is not a great choice for you. You will want to use the
> > >> HitCollector instead...but remember that you are losing the
> > >> normalized
> > >> scores (simple to copy code if you still want it) and the document
> > >> caching (I rarely want that anyway).
> > >>
> > >> An issue to watch out for: with Hits, you do not have to ask for how
> > >> many docs to get back, but with a HitCollector solution you will need
> > >> to. This is a minor dilema if you want to go over all of the hits no
> > >> matter what. You can pass a huge number to ensure you get everything,
> > >> but you will be creating large data structures if you do this, as
> > >> structure sizes may be initialized by the number you pass. Also,
> > >> passing
> > >> the maximum integer will cause an error (negative init size) as
> > >> Lucene
> > >> initializes a data structure to hold the hits as n+1.
> > >>
> > >> - Mark
> > >>
> > >>
> > >> Haroldo Nascimento wrote:
> > >>> I have a problem of performance when I need group the result do
> > >>> search
> > >>>
> > >>> I have the code below:
> > >>>
> > >>>   for (int i = 0; i < hits.length(); i++) {
> > >>>                    doc = hits.doc(i);
> > >>>
> > >>>                    obj1 = doc.get(Constants.STATE_DESC_FIELD_LABEL);
> > >>>                    obj2 = doc.get(xxx);
> > >>>                    ...
> > >>>   }
> > >>>
> > >>>  I work with volume of data very big. The search process in 0.300
> > >>> seconds but when the object hits have much results, the time for get
> > >>> all objects is very big. The command hits.doc(i) is processed in 2
> > >>> second.
> > >>>
> > >>>  Por exemplo. For hits.length() equals the 25.000 results, the time
> > >>> of "pos search" is 7 seconds.
> > >>>
> > >>>  I get all result because I need group the result (remove the
> > >>> duplicate results).
> > >>>
> > >>>  Is there any form in Lucene that group the result. I need of
> > >>> anything as the command "group by" of sql.
> > >>>
> > >>>  Thanks.
> > >>>
> > >>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >>> For additional commands, e-mail: [EMAIL PROTECTED]
> > >>>
> > >>>
> > >>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [EMAIL PROTECTED]
> > >> For additional commands, e-mail: [EMAIL PROTECTED]
> > >>
> > >>
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > > For additional commands, e-mail: [EMAIL PROTECTED]
> > >
> >
> > --------------------------
> > Grant Ingersoll
> > http://lucene.grantingersoll.com
> >
> > Lucene Helpful Hints:
> > http://wiki.apache.org/lucene-java/BasicsOfPerformance
> > http://wiki.apache.org/lucene-java/LuceneFAQ
> >
> >
> >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
> >
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Time of processing hits.doc()

Reply via email to