Re: [xwiki-devs] [Solr] What do we search for?

Ludovic Dubost Fri, 11 Oct 2013 03:29:15 -0700

As for the searching for "wiki" example, this is a good case where in
really we just need to search for documents since one wiki = one document.
The only case where we could need to search for objects is when 1 document
= multiple objects and it's not that common and even less comment that we
need to make full take searches on the object entities separately.


Ludovic


2013/10/11 Ludovic Dubost <[email protected]>

> Hi,
>
> From my point of view we usually search mostly for two types of things:
>
> - documents
> - attachements
>
> But we should be able to filter these results on multiple property values
> of any object. This is true also for documents and for attachments.
> It is also interesting to be able to present results differently depending
> on the document we get (if it's has meeting document or a user document we
> display things differently)
> Being able to search for attachments separately is very important.
>
> As for objects most of the time we search for documents that have this
> specific object.
> There is however a use case I see where it could be interesting to search
> in individual objects.
> For example this is the case for comments. It could be interesting to make
> a search in all comments.
>
> Another example could be tasks. Suppose you add tasks inside documents
> associated to some content of the document (like annotations).
> You might want to be able to make some nice search on all the tasks and
> then display a link to the document in which the task is but not the other
> way around.
>
> Now I think this use case could be optional, so we don't necessarly need
> to index all objects of all classes. We could have some config which tells
> to make an index for all comments objects or all task objects. I think we
> already had an object index in lucene and I don't remember if we have ever
> used it.
>
> I don't think we need an index on all properties.
>
> Ludovic
>
>
>
> 2013/10/11 Marius Dumitru Florea <[email protected]>
>
>> Hi devs,
>>
>> This is a very important question so think carefully. Let me explain:
>>
>> In XWiki (model) we have a few entity types. There are *wikis* which
>> have *spaces* which have *documents*. A document can have *objects*
>> and *attachments*. A document can also define a *class*.
>>
>> At the same time we like to say that in XWiki "everything is a
>> document" because everything revolves around documents. The document
>> is the central notion.
>>
>> We can query the database (using HQL or XWQL) for any of the
>> previously mentioned entities but what should a Solr query return
>> (semantically)? In other words:
>>
>> * are you searching for an object without caring about the document
>> that holds the object? Same for an object property.
>> * how often are you searching for an attachment without caring about
>> the document that holds the attachment?
>> * are you searching for a class or for the document that defines that
>> class?
>> * are you searching for a wiki without caring about the documents it
>> contains? Same for a space.
>>
>> IMO the result of a Solr query should be, semantically, a list of
>> documents. But maybe I'm wrong.
>>
>> -----------------------
>> Technical Details
>> -----------------------
>>
>> Unlike a relational database, Solr/Lucene index has a single 'table'.
>> So normally you index a single entity type. Each row in the index
>> represents an entity of that type. As a consequence the result of a
>> Solr query is semantically a list of entities of that type. In our
>> case the entity type is (naturally) *document*.
>>
>> If you want to index more entity types (e.g. index attachments and
>> objects _separately_, not as part of a document) then, since there is
>> only one 'table' in the index, you need to add a 'type' column that
>> specifies the type of entity you have on each row (e.g. type=document,
>> type=attachment, type=object etc.). The result of a Solr query is now,
>> semantically, a list of different entity types, unless you filter by a
>> specific type. It smells like a hack to me.
>>
>> Let's imagine what happens if we want to search for blog posts that
>> has a specific tag. With the first approach this is easy because all
>> the (indexed) information is on a single row. With the second approach
>> this is considerably more complex because the information is spread on
>> multiple rows:
>>
>> * one row with type=document for the blog post document
>> * one row with type=object for the blog post object
>> * one row with type=object for the tab object
>>
>> In a relational database when you have the information spread in
>> multiple places (tables) you do joins. Fortunately (you would says)
>> Solr supports joins. In this particular case we would have to perform
>> 2 joins which means:
>>
>> index X index X index
>>
>> where X represents the cartesian product. The document name would be
>> the join key. Pretty complex even before trying to write this in Solr
>> query syntax..
>>
>> So basically the question becomes: is it worth indexing more entities
>> _separately_ instead of indexing just documents (with info about their
>> objects and attachments) considering the complexity that it brings in
>> writing Solr queries? Do we search for objects and attachments alone
>> as separate entities often enough to justify this complexity? My
>> answer is no.
>>
>> Thanks,
>> Marius
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
>
>
>
> --
> Ludovic Dubost
> Founder and CEO
> Blog: http://blog.ludovic.org/
> XWiki: http://www.xwiki.com
> Skype: ldubost GTalk: ldubost
>



-- 
Ludovic Dubost
Founder and CEO
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [Solr] What do we search for?

Reply via email to