Re: [xwiki-devs] approaching mild optimizations of Lucene plugin

Sergiu Dumitriu Sun, 31 Jan 2010 11:33:13 -0800

On 01/27/2010 05:54 PM, Paul Libbrecht wrote:
>
> Hello devs,
>
> I'm trying a few optimizations of the Lucene plugin and try to keep
> this flexible and not too intergeo or curriki specialized.
>
> The fact is that this plugin uses Lucene in a very blind and heavy
> fashion, which gives a lot of power (but which is not used). Mostly,
> I'd like, in a configurable way:
>
> - to decide to store and/or index or not some objects or object
> properties
> - to decide to exclude some documents
> - to decide to use particular analyzers for particular fields (in
> particular the "token fields")
>
> I know it would be almost possible by replacing lucene by solr and
> letting users tune solr.
> But maybe it is simple to have the configurability done in xwiki.
>
> Probably the nicest way I see this would be the way the notifications
> are done: a central field indicates the page of a groovy source which
> should implement such an interface as "LuceneIndexProfile" which would
> add such questions (maybe even including some more such as the Data
> classes).
>
> Is the nicest above easy?
> Do we prefer and xml configuration?


Hi Paul,

I'm not sure I understood your approach, could you explain it in more 
detail? What do you mean by "central field"?


The way I see it, each indexed field will have a reference, given by 
some coordinates (this is related to the thread about object and 
properties references), such as 
"wiki:Space.Document^classname[index].property". There should be a 
collection of filters (components implementing LuceneIndexFilter) which 
have the following method:

boolean filter(Reference entity, LuceneIndexProfile profile);

The meaning is the following:
- entity is the entity to process (could be a document, an object 
property, an attachment)
- profile is the indexing profile built by the filters, initialized with 
some default values in the Lucene Plugin, and modified by the filters as 
it passes through them
- returning true means that the filtering process should stop, since the 
current filter decided that the profile is ready (for example if a 
filter decided that the document should not be indexed due to security 
restrictions, then it's useless to run all the other filters); by 
default filters return false, letting the other filters to adjust the 
profile
- each filter looks at the reference and, based on some internal rules, 
decides if it should alter the filter for this entity, and if it 
considers that no more filtering is useful/needed

After the filtering is done, the plugin indexes (or not) the entity 
according to the values in the profile.

This means that we could have several components affecting the Lucene 
behavior, each one with particular goals in mind (security, performance, 
searchability), and each one with its own configuration.


So, what needs to be done (except writing the code) is define the 
possible settings in the LuceneIndexProfile, define the filters needed, 
decide how to configure them. XML files on the server are an option, but 
one not flexible enough. Maybe objects inside the wiki will give more 
flexibility to application developers. So, another thing to do is decide 
the fields needed in such a class.

Of course, if somebody needs a new filter, it's easy to add a new jar or 
write a new Groovy page in the wiki.

-- 
Sergiu Dumitriu
http://purl.org/net/sergiu/
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] approaching mild optimizations of Lucene plugin

Reply via email to