RE: [Lucene-users] Ant Building Problem

Alex Murzaku Wed, 20 Jun 2001 12:57:27 -0700
download jakarta-ant-1.3-optional.jar into ant/lib

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]]On Behalf Of Gerardo
Arroyo
Sent: Wednesday, June 20, 2001 3:55 PM
To: '[EMAIL PROTECTED]'
Subject: [Lucene-users] Ant Building Problem


Hi!

Every time that I attempt to build the last CVS lucene release I got a Java
Exception,
Any advice?

Note: the org.apache.tools.ant.taskdefs.optional.junit.JUnitTask doesn't
exist.


C:\lucene\build.xml:201: taskdef class org.apache.tools.ant.taskdefs.option
al.junit.JUnitTask cannot be found
--- Nested Exception ---
java.lang.ClassNotFoundException:
org.apache.tools.ant.taskdefs.optional.junit.J
UnitTask
        at java.net.URLClassLoader$1.run(URLClassLoader.java, Compiled Code)
        at java.lang.Exception.<init>(Exception.java, Compiled Code)
        at
java.lang.ClassNotFoundException.<init>(ClassNotFoundException.java,
Compiled Code)
        at java.net.URLClassLoader$1.run(URLClassLoader.java, Compiled Code)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java, Compiled
Code)

        at java.lang.ClassLoader.loadClass(ClassLoader.java, Compiled Code)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java,
Compiled Co
de)
        at java.lang.ClassLoader.loadClass(ClassLoader.java, Compiled Code)
        at org.apache.tools.ant.taskdefs.Taskdef.execute(Taskdef.java:111)
        at
org.apache.tools.ant.ProjectHelper$TaskHandler.finished(ProjectHelper
.java:482)
        at
org.apache.tools.ant.ProjectHelper$AbstractHandler.endElement(Project
Helper.java, Compiled Code)
        at com.sun.xml.parser.Parser.maybeElement(Parser.java:1413)
        at com.sun.xml.parser.Parser.content(Parser.java:1499)
        at com.sun.xml.parser.Parser.maybeElement(Parser.java:1400)
        at com.sun.xml.parser.Parser.parseInternal(Parser.java:492)
        at com.sun.xml.parser.Parser.parse(Parser.java:284)
        at javax.xml.parsers.SAXParser.parse(SAXParser.java:155)
        at org.apache.tools.ant.ProjectHelper.parse(ProjectHelper.java,
Compiled
 Code)
        at
org.apache.tools.ant.ProjectHelper.configureProject(ProjectHelper.jav
a:85)
        at org.apache.tools.ant.Main.runBuild(Main.java, Compiled Code)
        at org.apache.tools.ant.Main.main(Main.java:149)

-----Original Message-----
From: Doug Cutting [mailto:[EMAIL PROTECTED]]
Sent: Martes 5 de Junio de 2001 11:11 AM
To: 'Snyder, David'; '[EMAIL PROTECTED]'
Subject: RE: [Lucene-users] Question about search that returns a TopDocs
o bject...


> From: Snyder, David [mailto:[EMAIL PROTECTED]]
>
> I've noticed that a search method with a new signature has
> been added to the
> Searcher and MultiSearcher classes since the 1.0 release...  one that
> returns a TopDocs object and presumably only a limited number
> of results.

This is used internally by Searcher.search().  Lucene only ever collects the
top documents.  The Hits() object initially retrieves the top 100 documents.
Attempts to access documents beyond this will result in re-querying.

Note in particular that a Document object is never read from the index until
Hits.doc(int) is called.  Reading documents is slow.  As a rule of thumb,
for high-performance search it should only be done for documents which will
be displayed.

> I
> was getting ready to start experimenting with this method,
> but noticed is is
> not public, but protected.  Is this intentional?

Yes, to keep the public API simple.  Calling these directly is not much more
efficient than accessing only the top results from a Hits object.

If you want more explicit control, implement your own HitCollector and use
IndexSearcher.search(Query, HitCollector).  Remember, an efficient
HitCollector must not read document objects, e.g., by calling Searcher.doc()
or IndexReader.document(), but should rather only use the document number
and its score.  The default hit collector is an anonymous class defined at
line 75 of IndexSearcher.java.

Perhaps it is worth making this collector public, as well as making TopDocs
and ScoreDoc public.  Thoughts?

It is trivial to construct hit collectors that only return the most recent
or oldest documents.  Document numbers increase as documents are added, and
collectors are always called with increasing document numbers.  So the
oldest 10 documents are always the first 10 passed to the collector, and the
oldest are always the last 10 passed to the collector.

  class OldestHits extends HitCollector {
    private int[] hits;
    private int hitCount = 0;
    public OldestHits(int count) {
      this.hits = new int[count];
    }
    public void collect(final int doc, final float score) {
      if (hitCount < hits.length)
        hits[++hitCount] = doc;
    }
    public Document[] hits(IndexReader reader) {
      Document[] result = new Document[hitCount];
      for (int i = 0; i < hitCount; i++) {
        result[i] = reader.document(hits[i]);
      }
      return result;
    }
  }

  class NewestHits extends HitCollector {
    private int[] hits;
    private int hitCount = 0;
    public NewestHits(int count) {
      this.hits = new int[count];
    }
    public final void collect(final int doc, final float score) {
      hits[++hitCount % hits.length] = doc;
    }
    public Document[] hits(IndexReader reader) {
      int count = Math.min(hitCount, hits.length);
      Document[] result = new Document[count];
      for (int i = count-1; i >= 0; i--) {
        result[i] = reader.document(hits[i]);
      }
      return result;
    }
  }

Caveat Hackor: This code has not been tested!

> We are using Lucene with very large indexes and I'm looking
> for ways to
> decrease the memory requirements when getting a large number
> of hits.  Any
> suggestions?

See above.  Also, if you never need all of the document objects at once,
write a hit collector which returns hits as a raw array of integers and only
read the documents as you process them.

Doug

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users

_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users


_______________________________________________
Lucene-users mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/lucene-users
RE: [Lucene-users] Ant Building Problem

Reply via email to