Here's a short summary of what I implemented in the end: * I'm using an encoding scheme similar to the URL-encoding to support special characters in field names. I didn't use directly the URL-encoding because '+' (plus) and '%' (percent) have special meaning in Solr query syntax. Also, I didn't want to encode Unicode letters.
E.g. "Somé Spâce.Bob's Claß" is encoded as "Somé$20Spâce.Bob$27s$20Claß" * I wanted to be able to extract the class and property reference from a field name in order to display the location where the search text has been found. I couldn't use the default class / property reference serialization syntax because '\' and '^' have special meaning in the Solr query syntax. So I implemented a simple serialization syntax that uses only '.' as entity separator and the dot is escaped by repeating it. E.g. "wiki:Some\.Space.My\.Class^color" is serialized as "wiki.Some..Space.My..Class.color" * I added the following fields to a document's index: object : all types of objects found on the indexed document object.Space.Class : collects values from all Space.Class properties property.Space.Class.propName : indexes the values of Space.Class^propName (multiple values if there are multiple objects of type Space.Class) * object.* and property.* are multilingual fields so they are indexed in multiple languages. I added support for dynamic aliases (for dynamic fields) so we can write object:Blog.BlogPostClass AND property.Blog.BlogPostClass.title:text AND object.XWiki.TagClass:news and it will be expanded into object:Blog.BlogPostClass AND (property.Blog.BlogPostClass.title_en:text OR property.Blog.BlogPostClass.title_fr:text OR ...) AND (object.XWiki.TagClass_en:news OR object.XWiki.TagClass_fr:news OR ...) NOTE: Solr doesn't support dynamic fields as default fields, i.e. as fields that are matched when you search for free text (without field:value in the query). This is not a problem for the search results, as dynamic fields like object.* and property.* are copied and aggregated in 'objcontent' which is a default field. The issue is that we can't know what is exactly the XClass property that was matched, we just know that the free search text was found inside an object. WDYT? I can still make adjustments before 5.3 final if you think something is wrong. Thanks, Marius On Fri, Nov 15, 2013 at 9:01 AM, Paul Libbrecht <[email protected]> wrote: > Hello Marius, > >>> I would suggest to generate the schema and config, reloading every time >>> there's a class change. >> >> That would mean re-indexing everything right? It would take to much time. > > No for most cases. > A Lucene index is "just" a heap of "terms". > If you change the schema in that you add a new field, the impact on the index > is zero. > If you rename a field, you need to reindex. > If you change the type of a field (or its analyzer) then you have to reindex. > If you delete a field, you leave some dirt, you'd have to reindex if you > rewake this field name. > >>> I believe that the query-expansion step, from title:x to title-en:x >>> title-ft:x, etc… is best to be controlled early so that applications can >>> change that somehow. In curriki, this is done with a custom query-component >>> which uses the query-parser (with a default-field which does not exist) >>> then rewrites the query objects (which is a fairly easy game). >> >> That's actually what I'm currently investigating. I'll try to extend >> the ExtendedDismaxQParserPlugin, let it do its query parsing and then >> expand the query with more query objects when the "field" name matches >> some pattern (e.g. property_*) > > I am not sure it's best practice, but as an application developer, I would > enjoy if this code was in a Groovy page. > > paul > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

