Parsing HTMLString stored in a database

Peter Hendrickx Fri, 21 Mar 2003 12:37:29 -0800

Hi, I'm just starting to use Lucene.

Did not find any info about parsing HTML-strings.

Tried something myself, it creates an index without any errors, but also without any term.

Can someone give me a hint.

public void createIndexFromHTMLString(String sDocument,int DocumentID) throws Exception{ Document tempDoc = new Document(); HTMLParser parser = new HTMLParser(new StringReader(sDocument)); tempDoc.add(Field.UnIndexed("ID","" +DocumentID)); tempDoc.add( Field.Text("content",parser.getReader())); IndexWriter writer = new IndexWriter("c:\\lucene-1.2\\testindex", new StandardAnalyzer(), true); writer.maxFieldLength = 1000000; writer.addDocument(tempDoc); writer.optimize(); writer.close(); }

Peter Hendrickx

_________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Parsing HTMLString stored in a database

Reply via email to