My unsolicited two � cents: I like Brian's idea. Still, I'm curious if it would be possible (and prudent) to allow a little more flexibility. In some cases it might be useful to use different but compatible Analyzers for indexing and searching.
An example would be to index all words, and then perform searches removing stopwords from the queries. If I understand the process correctly this would achieve several things: First it would decrease (not eliminate, I admit) the influence of stopwords in scoring, resulting in more relevant results. Second it would preserve information about the proximity of words and depending on what you're interested in, make queries using slop factor more meaningful. Finally, if you wanted, you could let a user choose at search time whether or not to remove the stopwords from queries, using different Analyzers but the same index. The merit of this particular example may be debatable, but less relevant to the current discussion. The point is that it might be desirable to use different Analyzers for indexing and searching. So... might there be a compromise? Is there a way of indicating the type of Analyzer used to create an index and requiring that a compatible Analyzer be used for searches without requiring the exact same Analyzer? I had thought that maybe compatible Analyzers could implement the same empty interface, but that would be difficult to do with Analyzers created from rules, wouldn't it? I'm curious to hear what you folks think. -Lex >From: Brian Goetz <[EMAIL PROTECTED]> >Reply-To: "Lucene Developers List" <[EMAIL PROTECTED]> >To: Lucene Developers List <[EMAIL PROTECTED]> >Subject: Re: Normalization >Date: Mon, 11 Mar 2002 14:56:18 -0800 > > > Isn't this really a property of an index rather then an entire Lucene > > build? > >Technically no, but in spirit, yes. > >Personally, I always liked the idea of creating an Analyzer at index >creation time, and having the Analyzer object stored as a serialized >object in the index. Then you couldn't make the all-too-common >mistake of indexing with one and then trying to search with another. > > > If so, having a text-based way to describe a policy is very helpful > > and better than a source code-based one. > >yup. > >-- >To unsubscribe, e-mail: ><mailto:[EMAIL PROTECTED]> >For additional commands, e-mail: ><mailto:[EMAIL PROTECTED]> > _________________________________________________________________ MSN Photos is the easiest way to share and print your photos: http://photos.msn.com/support/worldwide.aspx -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
