I am a newbie to Lucene, and have been learning by experiment and from the demos. A problem has arisen in indexing a document after creation, and before indexing in the permanent index. It is being indexed to this small lookaside index in order to determine whether it is "sponsored" [i.e. contains any word that causes it to be included in one of the 'sponsored' document levels.] (A separate letter deals with the larger issues of sponsorship.) If it is sponsored, then a setBoost for the document will be issued, with a level-dependent value.
The code in question arises from within IndexHTML
near:
doc = new HTMLDocument(file);
writer.addDocument(doc);In the case at issue, this code has been changed to:
doc = new HTMLDocument(file);
int boost = sponsoredValue(doc);
doc.setBoost(boost);
writer.addDocument(doc);The sponsoredValue method never returns.
The exception occurs after a longish delay in
eclipse, about 2-3 seconds. The document used is:
http://www.w3.org/TR/xquery
stored as a local file. The same document indexes
correctly when the call to sponsoredValue and setBoost
are removed.HTMLDocument was modified in minor ways. HTMLParser is destined for modification, but is still vanilla.
Note that altering RAMDirectory to FSDirectory makes no difference and does not change the behavior.
I greatly Appreciate any help, thank you all.
-------------------------------------------------
the Document doc:
url: Keyword, string
file: Unindexed, string
modified: Keyword, string
uid: as in HTMLdemo, string
contents: Text, reader
title: Text, string
metadata: Text, stringthe code:
private static RAMDirectory ramDir = null; private static IndexWriter ramWriter = null; private static IndexReader ramReader = null; private static IndexSearcher ramSearcher = null;
public int sponsoredValue(Document doc) {
.
.
.
ramDir = new RAMDirectory();
ramWriter = new IndexWriter(ramDir, new StandardAnalyzer(), true);
+--> ramWriter.addDocument(doc);
| ramWriter.close();
| ramWriter = null;
| ramReader = IndexReader.open(ramDir);
| ramSearcher = new IndexSearcher(ramReader);
| .
| .
| .
| }
|
the Exception:java.io.IOException: Pipe closed
at java.io.PipedInputStream.receive(Unknown Source)
at java.io.PipedInputStream.receive(Unknown Source)
at java.io.PipedOutputStream.write(Unknown Source)
at sun.nio.cs.StreamEncoder$CharsetSE.writeBytes(Unknown Source)
at sun.nio.cs.StreamEncoder$CharsetSE.implWrite(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at sun.nio.cs.StreamEncoder.write(Unknown Source)
at java.io.OutputStreamWriter.write(Unknown Source)
at java.io.Writer.write(Unknown Source)
at org.apache.lucene.demo.html.HTMLParser.addText(HTMLParser.java:141)
at org.apache.lucene.demo.html.HTMLParser.HTMLDocument(HTMLParser.java:200)
at org.apache.lucene.demo.html.ParserThread.run(ParserThread.java:69)--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
