Hello, I am doing some work to shuffle things around and consolidate analyzers into what will hopefully be its own versioned module (such that you could use an older version with a newer Lucene core and we could remove "fake" Version and use real jar file versions).
For a while I have been thinking about how we might apply this to Solr, so it gets the same benefit. At the same time, there are other "problems" with analysis in Solr I would like to fix at the same time: 1. Solr, like Lucene, should be able to work with an older analyzers module for backwards compatibility purposes. 2. Solr users should optionally be able to use analyzers that are not in common (smartcn, stempel, icu, ...) easily. Currently this is a tradeoff against the size of the solr war file (so they are not included). At the same time it seems silly to make solr contribs for 'more analyzers'. The current idea I have is that Solr would not include analyzers-common.jar bundled into its war file at all. Instead, all analyzers modules would also serve as plugins to Solr (you stick them in solrhome/lib). By default, Solr would just include analyzers-common this way, instead of in the war file itself. So with this idea, analyzers are just a Solr plugin, and the default Solr install includes the ones it does today, so most users would not see the difference. But if a user wants Polish, Smart Chinese, or improved Unicode support, they would be able to drop in one of the additional analyzer modules easily. The factories for Solr serve as a buffer to hide the implementation details, and I think they should be part of these analyzer modules, so when you produce an analyzers artifact it is both a plugin to Lucene and also a plugin to Solr. In my opinion, this factory interface is very well defined and achieves for Solr <-> analyzers what we want to achieve for Lucene <-> analyzers, a minimal interface. Down the road, we could look at improving on this further, for example any given release of analyzers artifacts could include additional artifacts that "go with it": 1. example configuration files like stopwords lists for different languages 2. example schema definitions (even snippets) for Solr users as a documentation artifact, so they know how to use this stuff. ... Thoughts, alternatives proposals? -- Robert Muir [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
