RE: Normalization

Alex Murzaku Mon, 11 Mar 2002 14:36:12 -0800

The generic string transducer kit could become a fine and widely used
lucene contrib tool but could also become more than that: a standalone
tool like Snowball. The formal language Rodrigo describes is quite
powerful and allows for a lot.

What I was trying to say is that it doesn't need to be plugged. But
thinking it over and reading your comments, I now understand that having
it output Analyzer code, that could be quite nice and would enforce
index/search analyzer synchronization.

-----Original Message-----
From: Brian Goetz [mailto:[EMAIL PROTECTED]] 
Sent: Monday, March 11, 2002 5:20 PM
To: Lucene Developers List
Subject: Re: Normalization

> As I have said before in this list, this gets way off of Lucene. The 
> normalizer, or the morphologic analyzer or the phonetic transducer, or

> the stemmer, or the thesaurus -- they all could be stand-alone 
> products.

I think that as Lucene matures, ALL of the sample implementations of
Analyzers (SimpleAnalyzer, StandardAnalyzer, the porter stemmer) should
be moved out of the "core" project and into the "library" of plug-ins,
leaving the core with only interfaces and perhaps the most basic
building blocks (WordTokenizer, LowerCaseFilter.)  Until recently, there
have been few plug-ins available, but this is changing and eventually we
will want to recognize this.

I think a good step would be to create a separate Lucene subproject, for
Analyzers and other plug-ins, and we can give out commit privs to those
more widely to people who have that domain expertise.  

--
To unsubscribe, e-mail:
<mailto:[EMAIL PROTECTED]>
For additional commands, e-mail:
<mailto:[EMAIL PROTECTED]>

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: Normalization

Reply via email to