This is so cool Otis.  I was just to write this off of something in the FAQ, 
but this is better then what I was doing.

This rocks!!!  Thank you.

JohnE

P.S.:  I am assuming you use org.apache.lucene.analysis.Token?   There are 
three Token's under Lucene.



----- Original Message -----
From: Otis Gospodnetic <[EMAIL PROTECTED]>
Date: Wednesday, November 17, 2004 7:17 pm
Subject: Re: Considering intermediary solution before Lucene question

> Yes, you can use just the Analysis part.  For instance, I use this for
> http://www.simpy.com and I believe we also have this in the Lucene 
> bookas part of the source code package:
> 
>    /**
>     * Gets Tokens extracted from the given text, using the specified
> Analyzer.
>     *
>     * @param analyzer the <code>Analyzer</code> to use
>     * @param text the text to analyze
>     * @param field the field to pass to the Analyzer for tokenization
>     * @return an array of <code>Token</code>s
>     * @exception IOException if an error occurs
>     */
>    public static Token[] getTokens(Analyzer analyzer, String text,
> String field)
>        throws IOException
>    {
>        TokenStream stream = analyzer.tokenStream(field, new
> StringReader(text));
>        ArrayList tokenList = new ArrayList();
>        while (true) {
>            Token token = stream.next();
>            if (token == null)
>                break;
>            tokenList.add(token);
>        }
>        return (Token[]) tokenList.toArray(new Token[0]);
>    }
> 
> Otis
> 
> --- [EMAIL PROTECTED] wrote:
> 
> > 
> > Is there a way to use Lucene stemming and stop word removal without
> > using the rest of the tool?   I am downloading the code now, but I
> > imagine the answer might be deeply burried.  I would like to be able
> > to send in a phrase and get back a collection of keywords if
> > possible.
> > 
> > I am thinking of using an intermediary solution before moving fully
> > to Lucene.  I don't have time to spend a month making a carefully
> > tested, administratable Lucene solution for my site yet, but I 
> intend> to do so over time.  Funny thing is the Lucene code likely 
> would only
> > take up a couple hundred of lines, but integration and 
> administration> would take me much more time.
> > 
> > In the meantime, I am thinking I could use perhaps Lucene 
> steming and
> > parsing of words, then stick each search word along with the
> > associated primary key in an indexed MySql table.   Each record I
> > would need to do this to is small with maybe only average 15 userful
> > words.   I would be able to have an in-database solution though
> > ranking, etc would not exist.   This is better then the exact word
> > searching i have currently which is really bad.
> > 
> > By the way, MySql 4.1.1 has some Lucene type handling, but it too
> > does not have stemming and I am sure it is very slow compaired to
> > Lucene.   Cpanel is still stuck on MySql 4.0.* so many people would
> > not have access to even this basic ability in production systems for
> > some time yet.
> > 
> > JohnE
> > 
> > 
> > 
> > -----------------------------------------------------------------
> ----
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> > 
> 
> 
> -------------------------------------------------------------------
> --
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to