On Wed, Nov 13, 2013 at 8:08 PM, Ludovic Dubost <[email protected]> wrote: > Hi Marius, > > I have a quick question when starting reading your proposal. I don't see > anything about multi language indexing. > I remember in the current SOLR implementation that there are multiple > fields for each language. Would there be a fields for each language indexed > for each property ?
Yes. Right now I'm struggling to find a way to define an alias for a group of dynamic fields. For document title we have this in solrconfig.xml <str name="f.title.qf">title__ title_ar title_bg title_ca ...</str> which makes 'title' an alias for all its translations and allows us to write title:text in the search query. I need to do the same, but dynamically, for each object property: property_Blog.BlogPostClass_title = property_Blog.BlogPostClass_title__, property_Blog.BlogPostClass_title_en, property_Blog.BlogPostClass_title_fr, ... I'll keep you posted. Thanks, Marius > > Ludovic > > > 2013/10/14 Marius Dumitru Florea <[email protected]> > >> I started writing >> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help >> with two things: >> >> * test cases >> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases >> * if time permits, review the proposal, especially >> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach >> . >> >> Thanks, >> Marius >> >> >> On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea >> <[email protected]> wrote: >> > Hi devs, >> > >> > This is a very important question so think carefully. Let me explain: >> > >> > In XWiki (model) we have a few entity types. There are *wikis* which >> > have *spaces* which have *documents*. A document can have *objects* >> > and *attachments*. A document can also define a *class*. >> > >> > At the same time we like to say that in XWiki "everything is a >> > document" because everything revolves around documents. The document >> > is the central notion. >> > >> > We can query the database (using HQL or XWQL) for any of the >> > previously mentioned entities but what should a Solr query return >> > (semantically)? In other words: >> > >> > * are you searching for an object without caring about the document >> > that holds the object? Same for an object property. >> > * how often are you searching for an attachment without caring about >> > the document that holds the attachment? >> > * are you searching for a class or for the document that defines that >> class? >> > * are you searching for a wiki without caring about the documents it >> > contains? Same for a space. >> > >> > IMO the result of a Solr query should be, semantically, a list of >> > documents. But maybe I'm wrong. >> > >> > ----------------------- >> > Technical Details >> > ----------------------- >> > >> > Unlike a relational database, Solr/Lucene index has a single 'table'. >> > So normally you index a single entity type. Each row in the index >> > represents an entity of that type. As a consequence the result of a >> > Solr query is semantically a list of entities of that type. In our >> > case the entity type is (naturally) *document*. >> > >> > If you want to index more entity types (e.g. index attachments and >> > objects _separately_, not as part of a document) then, since there is >> > only one 'table' in the index, you need to add a 'type' column that >> > specifies the type of entity you have on each row (e.g. type=document, >> > type=attachment, type=object etc.). The result of a Solr query is now, >> > semantically, a list of different entity types, unless you filter by a >> > specific type. It smells like a hack to me. >> > >> > Let's imagine what happens if we want to search for blog posts that >> > has a specific tag. With the first approach this is easy because all >> > the (indexed) information is on a single row. With the second approach >> > this is considerably more complex because the information is spread on >> > multiple rows: >> > >> > * one row with type=document for the blog post document >> > * one row with type=object for the blog post object >> > * one row with type=object for the tab object >> > >> > In a relational database when you have the information spread in >> > multiple places (tables) you do joins. Fortunately (you would says) >> > Solr supports joins. In this particular case we would have to perform >> > 2 joins which means: >> > >> > index X index X index >> > >> > where X represents the cartesian product. The document name would be >> > the join key. Pretty complex even before trying to write this in Solr >> > query syntax.. >> > >> > So basically the question becomes: is it worth indexing more entities >> > _separately_ instead of indexing just documents (with info about their >> > objects and attachments) considering the complexity that it brings in >> > writing Solr queries? Do we search for objects and attachments alone >> > as separate entities often enough to justify this complexity? My >> > answer is no. >> > >> > Thanks, >> > Marius >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs >> > > > > -- > Ludovic Dubost > Founder and CEO > Blog: http://blog.ludovic.org/ > XWiki: http://www.xwiki.com > Skype: ldubost GTalk: ldubost > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

