On 8/21/07, Ard Schrijvers <[EMAIL PROTECTED]> wrote: > ...So would you like to see parts like chaining of filters for a indexing a > property? Think > that shouldn't be to hard to implement....
If that's within the scope of your work, that would IMHO be very useful, to give people precise control on how the various properties are indexed. ...Certainly something like > > <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" > ignoreCase="true" expand="false"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt"/> > > would ofcourse ease the use of implementing synonyms/stopwords yourself.... Yes, given that many Lucene TokenFilters are available, this is useful I think. I see two potential issues that you might want to take into account: 1) With configurable indexing analyzers, people sometimes have a hard time figuring out how exactly their data is indexed (and why they don't find it later). Solr provides an analysis test page for that (see "Solr's content analysis test page" in [1]). In the case of Jackrabbit, maybe logging the filtered values of fields at the DEBUG level would help. 2) As discussed previously, one problem with this is which analyzer to use when running a query that applies to several fields. In Solr, you can configure a different analyzer for querying, it's probably the best solution. People then have to make sure their config is consistent for indexing and querying, and might need in some cases to provide their own custom QueryAnalyzer to achieve this. For example one that provides fake synonyms for a token, with each synonym being the result of the one of the analysis methods used. This can get tricky depending on the configured analysis, when searching in multiple fields. See also http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for more info on how Solr manages the analyzers. -Bertrand [1] http://www.xml.com/lpt/a/1668
