Did not find any info about parsing HTML-strings.
Tried something myself, it creates an index without any errors, but also without any term.
Can someone give me a hint.
public void createIndexFromHTMLString(String sDocument,int DocumentID) throws Exception{
Document tempDoc = new Document();
HTMLParser parser = new HTMLParser(new StringReader(sDocument));
tempDoc.add(Field.UnIndexed("ID","" +DocumentID));
tempDoc.add( Field.Text("content",parser.getReader()));
IndexWriter writer = new IndexWriter("c:\\lucene-1.2\\testindex", new StandardAnalyzer(), true);
writer.maxFieldLength = 1000000;
writer.addDocument(tempDoc);
writer.optimize();
writer.close();
}
Peter Hendrickx
_________________________________________________________________
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
