On 01/27/2010 05:54 PM, Paul Libbrecht wrote: > > Hello devs, > > I'm trying a few optimizations of the Lucene plugin and try to keep > this flexible and not too intergeo or curriki specialized. > > The fact is that this plugin uses Lucene in a very blind and heavy > fashion, which gives a lot of power (but which is not used). Mostly, > I'd like, in a configurable way: > > - to decide to store and/or index or not some objects or object > properties > - to decide to exclude some documents > - to decide to use particular analyzers for particular fields (in > particular the "token fields") > > I know it would be almost possible by replacing lucene by solr and > letting users tune solr. > But maybe it is simple to have the configurability done in xwiki. > > Probably the nicest way I see this would be the way the notifications > are done: a central field indicates the page of a groovy source which > should implement such an interface as "LuceneIndexProfile" which would > add such questions (maybe even including some more such as the Data > classes). > > Is the nicest above easy? > Do we prefer and xml configuration?
Hi Paul, I'm not sure I understood your approach, could you explain it in more detail? What do you mean by "central field"? The way I see it, each indexed field will have a reference, given by some coordinates (this is related to the thread about object and properties references), such as "wiki:Space.Document^classname[index].property". There should be a collection of filters (components implementing LuceneIndexFilter) which have the following method: boolean filter(Reference entity, LuceneIndexProfile profile); The meaning is the following: - entity is the entity to process (could be a document, an object property, an attachment) - profile is the indexing profile built by the filters, initialized with some default values in the Lucene Plugin, and modified by the filters as it passes through them - returning true means that the filtering process should stop, since the current filter decided that the profile is ready (for example if a filter decided that the document should not be indexed due to security restrictions, then it's useless to run all the other filters); by default filters return false, letting the other filters to adjust the profile - each filter looks at the reference and, based on some internal rules, decides if it should alter the filter for this entity, and if it considers that no more filtering is useful/needed After the filtering is done, the plugin indexes (or not) the entity according to the values in the profile. This means that we could have several components affecting the Lucene behavior, each one with particular goals in mind (security, performance, searchability), and each one with its own configuration. So, what needs to be done (except writing the code) is define the possible settings in the LuceneIndexProfile, define the filters needed, decide how to configure them. XML files on the server are an option, but one not flexible enough. Maybe objects inside the wiki will give more flexibility to application developers. So, another thing to do is decide the fields needed in such a class. Of course, if somebody needs a new filter, it's easy to add a new jar or write a new Groovy page in the wiki. -- Sergiu Dumitriu http://purl.org/net/sergiu/ _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

