I am a newbie to Lucene, and have been learning by experiment and from the demos. A problem has arisen in indexing a document after creation, and before indexing in the permanent index. It is being indexed to this small lookaside index in order to determine whether it is "sponsored" [i.e. contains any word that causes it to be included in one of the 'sponsored' document levels.] (A separate letter deals with the larger issues of sponsorship.) If it is sponsored, then a setBoost for the document will be issued, with a level-dependent value.

The code in question arises from within IndexHTML
near:
        doc = new HTMLDocument(file);
        writer.addDocument(doc);

In the case at issue, this code has been changed to:
        doc = new HTMLDocument(file);
        int boost = sponsoredValue(doc);
        doc.setBoost(boost);
        writer.addDocument(doc);

The sponsoredValue method never returns.

The exception occurs after a longish delay in
eclipse, about 2-3 seconds.  The document used is:
          http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.

HTMLDocument was modified in minor ways.  HTMLParser
is destined for modification, but is still vanilla.

Note that altering RAMDirectory to FSDirectory makes
no difference and does not change the behavior.

I greatly Appreciate any help, thank you all.

-------------------------------------------------

the Document doc:
      url: Keyword, string
      file: Unindexed, string
      modified: Keyword, string
      uid: as in HTMLdemo, string
      contents: Text, reader
      title: Text, string
      metadata: Text, string

the code:

  private static RAMDirectory ramDir = null;
  private static IndexWriter ramWriter = null;
  private static IndexReader ramReader = null;
  private static IndexSearcher ramSearcher = null;

  public int sponsoredValue(Document doc) {
      .
      .
      .
      ramDir = new RAMDirectory();
      ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+-->  ramWriter.addDocument(doc);
|     ramWriter.close();
|     ramWriter = null;
|     ramReader = IndexReader.open(ramDir);
|     ramSearcher = new IndexSearcher(ramReader);
|     .
|     .
|     .
|     }
|
the Exception:

java.io.IOException: Pipe closed
        at java.io.PipedInputStream.receive(Unknown Source)
        at java.io.PipedInputStream.receive(Unknown Source)
        at java.io.PipedOutputStream.write(Unknown Source)
        at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
        at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
        at sun.nio.cs.StreamEncoder.write(Unknown Source)
        at sun.nio.cs.StreamEncoder.write(Unknown Source)
        at java.io.OutputStreamWriter.write(Unknown Source)
        at java.io.Writer.write(Unknown Source)
        at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
        at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
        at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)







---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to