Hi, I'm just starting to use Lucene.

Did not find any info about parsing HTML-strings.

Tried something myself, it creates an index without any errors, but also without any term.

Can someone give me a hint.

public void createIndexFromHTMLString(String sDocument,int DocumentID) throws Exception{
Document tempDoc = new Document();
HTMLParser parser = new HTMLParser(new StringReader(sDocument));
tempDoc.add(Field.UnIndexed("ID","" +DocumentID));
tempDoc.add( Field.Text("content",parser.getReader()));
IndexWriter writer = new IndexWriter("c:\\lucene-1.2\\testindex", new StandardAnalyzer(), true);
writer.maxFieldLength = 1000000;
writer.addDocument(tempDoc);
writer.optimize();
writer.close();
}



Peter Hendrickx


_________________________________________________________________



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to