Hello, and sorry for spamming, but I just want to share my findings/impressions, and what I am posting I am willimg to implement and port to the JackRabbit trunk (so if you bother to read it, and are positive about it, I will implement it :-) )
(if you make it to the end of this mail, I also describe how simple it would become to add a just in the trunk created SynonymProvider functionality....) First of all, the IndexingConfiguration, very promising! Exactly what we need for better indexing, and, consequently better search results. Because, in the end, what good is a repository when customers can't find the results they are looking for? Storing, versioning, workflow, all very important, but no good when nobody can find their content (duhh, obviously). So, one part that bothers me, is multilinguality (with lang specific stopwords, stemming, synonyms). Many customers these days want multilingual sites, and search them accordingly. And, obviously, lucene has quite some code for exactly this : see contrib/analyzers/src/java. Obviously, lucene has many more analyzers, and you can easily add your own. AFAIU, there is a single configuration place where I can define the overall JackRabbit analyzer that is used within one workspace: in repository.xml : <param name="analyzer" value="org.apache.lucene.analysis.standard.StandardAnalyzer"/> but, what I want, is a per property defineable analyzer (I would give bode_fr a french analyzer, body_de a german, some properties i might want to be indexed with keyword analyzers, like zipcodes). The best place for this IMO, is the IndexingConfiguration: then, if you do not configure it, nothing changes for you. So, for example the first index rule at http://wiki.apache.org/jackrabbit/IndexingConfiguration would change in: <index-rule nodeType="nt:unstructured" boost="2.0"> <property analyzer="org.apache.lucene.analysis.Analyzer.GermanAnalyzer">text_de</property> </index-rule> and during loading, we construct a Map of {jr-property,analyzer} (call it propertyAnalyzerMap). Then, all we need to add is one jackrabbit global analyzer, that look like: class JRAnalyzer extends Analyzer { Analyzer defaultAnalyzer = new StandardAnalyzer(); public TokenStream tokenStream(String fieldName, Reader reader) { Analyzer analyzer = (Analyzer)propertyAnalyzerMap.get(fieldName); if(analyzer!=null){ return analyzer.tokenStream(fieldName, reader); }else{ return this.defaultAnalyzer.tokenStream(fieldName, reader); } } } This very same JRAnalyzer is also used for the QueryParser in LuceneQueryBuilder, so this will work also for searching IIUC. So, WDOT? I can implement it and send a patch, but if the community is reluctant to it, I will have to do it for myself in a non jr code intrusive way. Example of the SynonymProvider mentioned at the top: If my suggested changes are accepted, things like a SynonymProvider becomes superfluous, and very easy to add on the fly: suppose, I want on the "body" property of my nodes always full searching with dutch synonyms. This boils down to adding an analyzer for this property, that extends the DutchAnalyzer in lucene, and that adds synonym functionality (very simple example in "lucene in action" book). I think it is better to do synonyms during analyzing (as opposed to the SynonymProvider in jr trunk), and simply use an analyzer for it. Ofcourse, a difference of using it, would be that with the current SynonymProvider you specifically have to define that you do a synonymsearch (~term), while with an analyzer, you define which properties whould be indexed with an synonymanalyzer, and searched accordingly (without having to specify it), So WDOT? Again, sry for mailing so much, just trying to sell my ideas :-) -- Hippo Oosteinde 11 1017WT Amsterdam The Netherlands Tel +31 (0)20 5224466 ------------------------------------------------------------- [EMAIL PROTECTED] / [EMAIL PROTECTED] / http://www.hippo.nl --------------------------------------------------------------
