This is so cool Otis. I was just to write this off of something in the FAQ, but this is better then what I was doing.
This rocks!!! Thank you. JohnE P.S.: I am assuming you use org.apache.lucene.analysis.Token? There are three Token's under Lucene. ----- Original Message ----- From: Otis Gospodnetic <[EMAIL PROTECTED]> Date: Wednesday, November 17, 2004 7:17 pm Subject: Re: Considering intermediary solution before Lucene question > Yes, you can use just the Analysis part. For instance, I use this for > http://www.simpy.com and I believe we also have this in the Lucene > bookas part of the source code package: > > /** > * Gets Tokens extracted from the given text, using the specified > Analyzer. > * > * @param analyzer the <code>Analyzer</code> to use > * @param text the text to analyze > * @param field the field to pass to the Analyzer for tokenization > * @return an array of <code>Token</code>s > * @exception IOException if an error occurs > */ > public static Token[] getTokens(Analyzer analyzer, String text, > String field) > throws IOException > { > TokenStream stream = analyzer.tokenStream(field, new > StringReader(text)); > ArrayList tokenList = new ArrayList(); > while (true) { > Token token = stream.next(); > if (token == null) > break; > tokenList.add(token); > } > return (Token[]) tokenList.toArray(new Token[0]); > } > > Otis > > --- [EMAIL PROTECTED] wrote: > > > > > Is there a way to use Lucene stemming and stop word removal without > > using the rest of the tool? I am downloading the code now, but I > > imagine the answer might be deeply burried. I would like to be able > > to send in a phrase and get back a collection of keywords if > > possible. > > > > I am thinking of using an intermediary solution before moving fully > > to Lucene. I don't have time to spend a month making a carefully > > tested, administratable Lucene solution for my site yet, but I > intend> to do so over time. Funny thing is the Lucene code likely > would only > > take up a couple hundred of lines, but integration and > administration> would take me much more time. > > > > In the meantime, I am thinking I could use perhaps Lucene > steming and > > parsing of words, then stick each search word along with the > > associated primary key in an indexed MySql table. Each record I > > would need to do this to is small with maybe only average 15 userful > > words. I would be able to have an in-database solution though > > ranking, etc would not exist. This is better then the exact word > > searching i have currently which is really bad. > > > > By the way, MySql 4.1.1 has some Lucene type handling, but it too > > does not have stemming and I am sure it is very slow compaired to > > Lucene. Cpanel is still stuck on MySql 4.0.* so many people would > > not have access to even this basic ability in production systems for > > some time yet. > > > > JohnE > > > > > > > > ----------------------------------------------------------------- > ---- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > > > ------------------------------------------------------------------- > -- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]