Hi Doug

Thanks for the feedback - it is much appreciated.  I have understood your
recommendations for changes to Lucene and plan to incorporate them within our
copy of lucene.  Do you, Scott or Eugene plan to add these public abstract
methods into the core code also ?

Thanks also for the tips to improve the performance of my hit collector;
although so far trial testing has produced some impressive performance for
retrieving hit results (but our indexes may not be running on full capacity
yet).  I believe at this stage we intend to return all the hits (since we are
not expecting excessive amounts of hits), but even still - I will perhaps try
to sort them as I go along, rather than waiting until the end.

With respect to the groups, I will analyse whether introducing bit vector
filters can be used in our system.  Most of our access security is governed by
groups, but i believe the rules may also be set at a user level too - which may
complicate matters - I will need to look into this further.

Once again, many thanks

Jo


Doug Cutting  (05/06/2001  17:24):
>Overall this looks good.
>
>A few comments:
>
>I think this would be cleaner if you could use the HitCollector interface
>directly, without the class MultiIndexHitCollector.  To do this you need to
>hide the document renumbering from the API, which also cleans things up.
>
>Thus I suggest that you:
>  - add the following to Searcher.java
>     public abstract void search(Query,HitCollector) throws IOException;
>     public abstract void search(Query,Filter,HitCollector) throws
>IOException;
>  - make Searcher.doc() a public method:
>     public abstract Document doc(int i) throws IOException;
>
>Then, in MultiSearcher.search(Query,HitCollector), when you call
>IndexSearcher.search(Query,HitCollector), pass in a HitCollector which
>performs the required renumbering.
>
>This can be done with something like:
>
> public final void search(Query query, HitCollector hitResults) {
>   this(query, null, hitResults);
> }
> public final void search(Query query, Filter filter, final HitCollector
>hitResults)
>   for (int i = 0; i < searchers.length; i++) {
>     final int start = starts[i]
>     searchers[i].search(query, filter, new HitCollector() {
>       public void collect(int doc, float score) {
>         hitResults.collect(doc + start, score);
>       }
>     });
> }
>
>Also I note that in your example collector you do two things which hurt
>search performance a lot:
>  1. You read the document object for every potential hit.  Search will be
>several timese faster if you can perform these checks without doing this.
>Perhaps you can construct a bit vector filter for different user groups,
>identifying which documents they can read.  Even though these are slow to
>construct (since they require reading every document) and need to be
>reconstructed each time the index changes, if you have very much query
>traffic it will still be much faster overall.
>  2. You save every hit and sort at the end.  This is slow and uses lots of
>memory.  If you know how you want them sorted, it is better to just keep a
>collection of the top hits.  As each hit is encountered, compare it to the
>worst hit so far collected.  If it is better, remove that one from the
>collection and add the new one in its place.  This way you never have to
>store or sort all of the hits.  See IndexSearcher.java line 75 for an
>example of this.
>
>Doug

------------------------------------------------------------------------

Joanne Sproston | Software Engineer
Teamware Group
[EMAIL PROTECTED]
phone: +44 (0)1782 794879  fax: +44 (0)1782  776667
www.teamware.com


_______________________________________________
Lucene-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-dev

Reply via email to