Yes, you can use just the Analysis part. For instance, I use this for http://www.simpy.com and I believe we also have this in the Lucene book as part of the source code package:
/** * Gets Tokens extracted from the given text, using the specified Analyzer. * * @param analyzer the <code>Analyzer</code> to use * @param text the text to analyze * @param field the field to pass to the Analyzer for tokenization * @return an array of <code>Token</code>s * @exception IOException if an error occurs */ public static Token[] getTokens(Analyzer analyzer, String text, String field) throws IOException { TokenStream stream = analyzer.tokenStream(field, new StringReader(text)); ArrayList tokenList = new ArrayList(); while (true) { Token token = stream.next(); if (token == null) break; tokenList.add(token); } return (Token[]) tokenList.toArray(new Token[0]); } Otis --- [EMAIL PROTECTED] wrote: > > Is there a way to use Lucene stemming and stop word removal without > using the rest of the tool? I am downloading the code now, but I > imagine the answer might be deeply burried. I would like to be able > to send in a phrase and get back a collection of keywords if > possible. > > I am thinking of using an intermediary solution before moving fully > to Lucene. I don't have time to spend a month making a carefully > tested, administratable Lucene solution for my site yet, but I intend > to do so over time. Funny thing is the Lucene code likely would only > take up a couple hundred of lines, but integration and administration > would take me much more time. > > In the meantime, I am thinking I could use perhaps Lucene steming and > parsing of words, then stick each search word along with the > associated primary key in an indexed MySql table. Each record I > would need to do this to is small with maybe only average 15 userful > words. I would be able to have an in-database solution though > ranking, etc would not exist. This is better then the exact word > searching i have currently which is really bad. > > By the way, MySql 4.1.1 has some Lucene type handling, but it too > does not have stemming and I am sure it is very slow compaired to > Lucene. Cpanel is still stuck on MySql 4.0.* so many people would > not have access to even this basic ability in production systems for > some time yet. > > JohnE > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]