As for the searching for "wiki" example, this is a good case where in really we just need to search for documents since one wiki = one document. The only case where we could need to search for objects is when 1 document = multiple objects and it's not that common and even less comment that we need to make full take searches on the object entities separately.
Ludovic 2013/10/11 Ludovic Dubost <[email protected]> > Hi, > > From my point of view we usually search mostly for two types of things: > > - documents > - attachements > > But we should be able to filter these results on multiple property values > of any object. This is true also for documents and for attachments. > It is also interesting to be able to present results differently depending > on the document we get (if it's has meeting document or a user document we > display things differently) > Being able to search for attachments separately is very important. > > As for objects most of the time we search for documents that have this > specific object. > There is however a use case I see where it could be interesting to search > in individual objects. > For example this is the case for comments. It could be interesting to make > a search in all comments. > > Another example could be tasks. Suppose you add tasks inside documents > associated to some content of the document (like annotations). > You might want to be able to make some nice search on all the tasks and > then display a link to the document in which the task is but not the other > way around. > > Now I think this use case could be optional, so we don't necessarly need > to index all objects of all classes. We could have some config which tells > to make an index for all comments objects or all task objects. I think we > already had an object index in lucene and I don't remember if we have ever > used it. > > I don't think we need an index on all properties. > > Ludovic > > > > 2013/10/11 Marius Dumitru Florea <[email protected]> > >> Hi devs, >> >> This is a very important question so think carefully. Let me explain: >> >> In XWiki (model) we have a few entity types. There are *wikis* which >> have *spaces* which have *documents*. A document can have *objects* >> and *attachments*. A document can also define a *class*. >> >> At the same time we like to say that in XWiki "everything is a >> document" because everything revolves around documents. The document >> is the central notion. >> >> We can query the database (using HQL or XWQL) for any of the >> previously mentioned entities but what should a Solr query return >> (semantically)? In other words: >> >> * are you searching for an object without caring about the document >> that holds the object? Same for an object property. >> * how often are you searching for an attachment without caring about >> the document that holds the attachment? >> * are you searching for a class or for the document that defines that >> class? >> * are you searching for a wiki without caring about the documents it >> contains? Same for a space. >> >> IMO the result of a Solr query should be, semantically, a list of >> documents. But maybe I'm wrong. >> >> ----------------------- >> Technical Details >> ----------------------- >> >> Unlike a relational database, Solr/Lucene index has a single 'table'. >> So normally you index a single entity type. Each row in the index >> represents an entity of that type. As a consequence the result of a >> Solr query is semantically a list of entities of that type. In our >> case the entity type is (naturally) *document*. >> >> If you want to index more entity types (e.g. index attachments and >> objects _separately_, not as part of a document) then, since there is >> only one 'table' in the index, you need to add a 'type' column that >> specifies the type of entity you have on each row (e.g. type=document, >> type=attachment, type=object etc.). The result of a Solr query is now, >> semantically, a list of different entity types, unless you filter by a >> specific type. It smells like a hack to me. >> >> Let's imagine what happens if we want to search for blog posts that >> has a specific tag. With the first approach this is easy because all >> the (indexed) information is on a single row. With the second approach >> this is considerably more complex because the information is spread on >> multiple rows: >> >> * one row with type=document for the blog post document >> * one row with type=object for the blog post object >> * one row with type=object for the tab object >> >> In a relational database when you have the information spread in >> multiple places (tables) you do joins. Fortunately (you would says) >> Solr supports joins. In this particular case we would have to perform >> 2 joins which means: >> >> index X index X index >> >> where X represents the cartesian product. The document name would be >> the join key. Pretty complex even before trying to write this in Solr >> query syntax.. >> >> So basically the question becomes: is it worth indexing more entities >> _separately_ instead of indexing just documents (with info about their >> objects and attachments) considering the complexity that it brings in >> writing Solr queries? Do we search for objects and attachments alone >> as separate entities often enough to justify this complexity? My >> answer is no. >> >> Thanks, >> Marius >> _______________________________________________ >> devs mailing list >> [email protected] >> http://lists.xwiki.org/mailman/listinfo/devs >> > > > > -- > Ludovic Dubost > Founder and CEO > Blog: http://blog.ludovic.org/ > XWiki: http://www.xwiki.com > Skype: ldubost GTalk: ldubost > -- Ludovic Dubost Founder and CEO Blog: http://blog.ludovic.org/ XWiki: http://www.xwiki.com Skype: ldubost GTalk: ldubost _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

