> > class JRAnalyzer extends Analyzer { Analyzer defaultAnalyzer = new
> > StandardAnalyzer();
> >
> > public TokenStream tokenStream(String fieldName, Reader
> reader) { Analyzer
> > analyzer = (Analyzer)propertyAnalyzerMap.get(fieldName);
> if(analyzer!=null){
> > return analyzer.tokenStream(fieldName, reader); }else{ return
> > this.defaultAnalyzer.tokenStream(fieldName, reader); } } }
> >
> > This very same JRAnalyzer is also used for the QueryParser in
> > LuceneQueryBuilder, so this will work also for searching
> IIUC. So, WDOT? I
> > can implement it and send a patch, but if the community is
> reluctant to it, I
> > will have to do it for myself in a non jr code intrusive way.
>
> This would work quite well for jcr:contains functions that
> operate on a
> property. However I'm not sure what to do with this:
>
> //*[jcr:contains(., 'hägar')]
>
> the node scope does not indicate which analyzer to use for
> the query statement.
> Would we just run the statement through all analyzers and
> combine them in an OR
> query?
Hmm, good point :-) OR-ing the terms with all analyzers seems wrong to me
(apart from possibly inefficient), because you might get results you should not
get: I only know a dutch example: suppose you index "branden" (=burn) with
Dutch analyzer. This results in the term "brand" because of stemming. Now,
OR-ing, might return you hits in English text that contains "brand" which you
aren't looking for at all. Anyway, you have a good point about this problem,
but since I think multilingual indexing might be quite useful, I'll give it
another thought.
>
> > Example of the SynonymProvider mentioned at the top:
> >
> > If my suggested changes are accepted, things like a
> SynonymProvider becomes
> > superfluous, and very easy to add on the fly:
> >
> > suppose, I want on the "body" property of my nodes always
> full searching with
> > dutch synonyms. This boils down to adding an analyzer for
> this property, that
> > extends the DutchAnalyzer in lucene, and that adds synonym
> functionality
> > (very simple example in "lucene in action" book). I think
> it is better to do
> > synonyms during analyzing (as opposed to the
> SynonymProvider in jr trunk),
> > and simply use an analyzer for it. Ofcourse, a difference
> of using it, would
> > be that with the current SynonymProvider you specifically
> have to define that
> > you do a synonymsearch (~term), while with an analyzer, you
> define which
> > properties whould be indexed with an synonymanalyzer, and searched
> > accordingly (without having to specify it),
>
> well, those are actually the reasons why I implemented it the
> other way. If you
> go the analyzer way to expand synonyms you have to re-index
> the complete content
> if you want to add a single synonym.
No, this is not the case IIUC. Adding a synonym will directly result in an
extra OR term (though it is not really important regarding the issue)
>I also wanted the user
> to decide if
> synonyms should be considered. Again this would not be
> possible if the analyzer
> automatically adds synonyms.
>
> but fortunately, with jackrabbit both is possible ;) if one
> prefers to expand
> terms on index time, just use an appropriate analyzer and
> don't configure a
> SynonymProvider.
Different horses for different courses, I understand your reasoning.
Regards Ard
>