The short answer is there is a great "highlighter" example in the Lucene In Action book. It sounds like you may just want to use that really. What with the snippets and html.
On 11/23/05 1:30 PM, "Alan Chandler" <[EMAIL PROTECTED]> wrote: > I am a brand new newbie with respect to Lucene, and I am just figuring out how > to include it into an application I am building. (personal blog) > > In essence I have a set of articles that reside in a database. Each one will > have an Integer ID identifying it, Textual Title, some key parameters (such > as date published, and categories) and body which is composed of text using > the "textile" format (see http://textism.com/tools/textile/). > > I have two sets of questions which I don't quite understand > > 1) The Analyser > > Since the body has some special syntax, I assume I have to extend the analyser > to skip the special symbols etc. Has anyone done this already? Is there a > standard place to look? If not, do I have to start again from scratch, or can > I just "configure" an existing one? (In particular, I have a routine which > will take a textile input string and produce an html output string - so could > I use the HTMLParser in the demo - alternatively JavaCC - is that something I > could use? - just came across it whilst writing this mail) > > I ultimately want to put a summary of the text on the front portion of my web > site. In order to calculate where the split is, and therefore how many > articles to place it would be useful as I am analysing it to get some > statistics like where is the end of the first paragraph. Is there a "hook" > that I can plug into to get that information out (I scanned the javadocs, but > I can't find anything obvious). > > 2) Use of different field types. > > I am stuggling to understand what field types I need for my different fields. > > For instance, I will want to index all the body of the article, so that the > words it contains show up in searches, and I will also want to output the > snippet around where the text is on a search page. However I can easily > retrieve the article from the database given its ID. Would I therefore make > the ID of the article a keyword, and the body of it unstored? and would I > build a special space separated string of the (undetermined number of) > categories and make them normal. > > TIA --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]