Hi Marius, I have a quick question when starting reading your proposal. I don't see anything about multi language indexing. I remember in the current SOLR implementation that there are multiple fields for each language. Would there be a fields for each language indexed for each property ?
Ludovic 2013/10/14 Marius Dumitru Florea <[email protected]> > I started writing > http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help > with two things: > > * test cases > http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases > * if time permits, review the proposal, especially > http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach > . > > Thanks, > Marius > > > On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea > <[email protected]> wrote: > > Hi devs, > > > > This is a very important question so think carefully. Let me explain: > > > > In XWiki (model) we have a few entity types. There are *wikis* which > > have *spaces* which have *documents*. A document can have *objects* > > and *attachments*. A document can also define a *class*. > > > > At the same time we like to say that in XWiki "everything is a > > document" because everything revolves around documents. The document > > is the central notion. > > > > We can query the database (using HQL or XWQL) for any of the > > previously mentioned entities but what should a Solr query return > > (semantically)? In other words: > > > > * are you searching for an object without caring about the document > > that holds the object? Same for an object property. > > * how often are you searching for an attachment without caring about > > the document that holds the attachment? > > * are you searching for a class or for the document that defines that > class? > > * are you searching for a wiki without caring about the documents it > > contains? Same for a space. > > > > IMO the result of a Solr query should be, semantically, a list of > > documents. But maybe I'm wrong. > > > > ----------------------- > > Technical Details > > ----------------------- > > > > Unlike a relational database, Solr/Lucene index has a single 'table'. > > So normally you index a single entity type. Each row in the index > > represents an entity of that type. As a consequence the result of a > > Solr query is semantically a list of entities of that type. In our > > case the entity type is (naturally) *document*. > > > > If you want to index more entity types (e.g. index attachments and > > objects _separately_, not as part of a document) then, since there is > > only one 'table' in the index, you need to add a 'type' column that > > specifies the type of entity you have on each row (e.g. type=document, > > type=attachment, type=object etc.). The result of a Solr query is now, > > semantically, a list of different entity types, unless you filter by a > > specific type. It smells like a hack to me. > > > > Let's imagine what happens if we want to search for blog posts that > > has a specific tag. With the first approach this is easy because all > > the (indexed) information is on a single row. With the second approach > > this is considerably more complex because the information is spread on > > multiple rows: > > > > * one row with type=document for the blog post document > > * one row with type=object for the blog post object > > * one row with type=object for the tab object > > > > In a relational database when you have the information spread in > > multiple places (tables) you do joins. Fortunately (you would says) > > Solr supports joins. In this particular case we would have to perform > > 2 joins which means: > > > > index X index X index > > > > where X represents the cartesian product. The document name would be > > the join key. Pretty complex even before trying to write this in Solr > > query syntax.. > > > > So basically the question becomes: is it worth indexing more entities > > _separately_ instead of indexing just documents (with info about their > > objects and attachments) considering the complexity that it brings in > > writing Solr queries? Do we search for objects and attachments alone > > as separate entities often enough to justify this complexity? My > > answer is no. > > > > Thanks, > > Marius > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs > -- Ludovic Dubost Founder and CEO Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

