Intermediate indexing before final

Daniel B. Davis Sun, 15 Feb 2004 12:49:53 -0800


I am a newbie to Lucene, and have been learning by experiment
and from the demos.   A problem has arisen in indexing
a document after creation, and before indexing in the
permanent index. It is being indexed to this small lookaside
index in order to determine whether it is
"sponsored" [i.e. contains any word that causes it to
be included in one of the 'sponsored' document levels.]
(A separate letter deals with the larger issues of
sponsorship.)  If it is sponsored, then a setBoost for
the document will be issued, with a level-dependent
value.

The code in question arises from within IndexHTML
near:
        doc = new HTMLDocument(file);
        writer.addDocument(doc);

In the case at issue, this code has been changed to:
        doc = new HTMLDocument(file);
        int boost = sponsoredValue(doc);
        doc.setBoost(boost);
        writer.addDocument(doc);

The sponsoredValue method never returns.

The exception occurs after a longish delay in
eclipse, about 2-3 seconds.  The document used is:
          http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.

HTMLDocument was modified in minor ways.  HTMLParser
is destined for modification, but is still vanilla.

Note that altering RAMDirectory to FSDirectory makes
no difference and does not change the behavior.

I greatly Appreciate any help, thank you all.

-------------------------------------------------

the Document doc:
      url: Keyword, string
      file: Unindexed, string
      modified: Keyword, string
      uid: as in HTMLdemo, string
      contents: Text, reader
      title: Text, string
      metadata: Text, string

the code:

  private static RAMDirectory ramDir = null;
  private static IndexWriter ramWriter = null;
  private static IndexReader ramReader = null;
  private static IndexSearcher ramSearcher = null;

  public int sponsoredValue(Document doc) {
      .
      .
      .
      ramDir = new RAMDirectory();
      ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+-->  ramWriter.addDocument(doc);
|     ramWriter.close();
|     ramWriter = null;
|     ramReader = IndexReader.open(ramDir);
|     ramSearcher = new IndexSearcher(ramReader);
|     .
|     .
|     .
|     }
|
the Exception:

java.io.IOException: Pipe closed
        at java.io.PipedInputStream.receive(Unknown Source)
        at java.io.PipedInputStream.receive(Unknown Source)
        at java.io.PipedOutputStream.write(Unknown Source)
        at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
        at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
        at sun.nio.cs.StreamEncoder.write(Unknown Source)
        at sun.nio.cs.StreamEncoder.write(Unknown Source)
        at java.io.OutputStreamWriter.write(Unknown Source)
        at java.io.Writer.write(Unknown Source)
        at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
        at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
        at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Intermediate indexing before final

Reply via email to