On Wednesday 23 Nov 2005 20:30, Erik Hatcher wrote: > On 23 Nov 2005, at 14:30, Alan Chandler wrote: > > 1) The Analyser > > First you'll have to spell it the US English way :)
You mean yet another corruption of my language:-) I am still having trouble with color rather than colour in all my css files. ... > > I don't know of a Textile analyzer - it looks like you could simply > configure all of its special symbols as a list of stop words and hand > it to StandardAnalyzer's constructor. Might be possible, the real difficult ones are the url etc > You could go to the trouble > of converting to HTML and then parse that, but that would be overkill > and of course slower. Well, I have to put it into html to display it on a web page, so its a form it will exist in at some stage. > > > I ultimately want to put a summary of the text on the front portion > > of my web > > site. In order to calculate where the split is, and therefore how > > many > > articles to place it would be useful as I am analysing it to get some > > statistics like where is the end of the first paragraph. Is there > > a "hook" > > that I can plug into to get that information out (I scanned the > > javadocs, but > > I can't find anything obvious). > > No, there is nothing special in an analyzer to help with this. It'd > probably be best to create a parser for Textile that can give you > back the raw text without the markup and also give you back the first > paragraph. I think you are probably right. I am just looking at the demo html parser and seeing how thats built from the javaCC stuff - looks to be something I could usefully study some more. > > > 2) Use of different field types. ... > > All of those options are possible and there is no Lucene "best way" > to do it. You could easily use Lucene itself as the entire blog > storage mechanism if you like, even :) I hadn't thought about it until you mentioned it. Indeed, that might be the right way to go (the database is little more than an article store and some verification tables for the article status (published or not) and category). Thanks -- Alan Chandler http://www.chandlerfamily.org.uk Open Source. It's the difference between trust and antitrust. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]