Re: [xwiki-devs] [Solr] What do we search for?

Marius Dumitru Florea Thu, 14 Nov 2013 08:29:17 -0800

On Wed, Nov 13, 2013 at 8:08 PM, Ludovic Dubost <[email protected]> wrote:
> Hi Marius,
>
> I have a quick question when starting reading your proposal. I don't see
> anything about multi language indexing.
> I remember in the current SOLR implementation that there are multiple
> fields for each language. Would there be a fields for each language indexed
> for each property ?


Yes. Right now I'm struggling to find a way to define an alias for a
group of dynamic fields. For document title we have this in
solrconfig.xml

<str name="f.title.qf">title__ title_ar title_bg title_ca ...</str>

which makes 'title' an alias for all its translations and allows us to
write title:text in the search query. I need to do the same, but
dynamically, for each object property:

property_Blog.BlogPostClass_title =
property_Blog.BlogPostClass_title__,
property_Blog.BlogPostClass_title_en,
property_Blog.BlogPostClass_title_fr, ...

I'll keep you posted.

Thanks,
Marius

>
> Ludovic
>
>
> 2013/10/14 Marius Dumitru Florea <[email protected]>
>
>> I started writing
>> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help
>> with two things:
>>
>> * test cases
>> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases
>> * if time permits, review the proposal, especially
>> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach
>> .
>>
>> Thanks,
>> Marius
>>
>>
>> On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea
>> <[email protected]> wrote:
>> > Hi devs,
>> >
>> > This is a very important question so think carefully. Let me explain:
>> >
>> > In XWiki (model) we have a few entity types. There are *wikis* which
>> > have *spaces* which have *documents*. A document can have *objects*
>> > and *attachments*. A document can also define a *class*.
>> >
>> > At the same time we like to say that in XWiki "everything is a
>> > document" because everything revolves around documents. The document
>> > is the central notion.
>> >
>> > We can query the database (using HQL or XWQL) for any of the
>> > previously mentioned entities but what should a Solr query return
>> > (semantically)? In other words:
>> >
>> > * are you searching for an object without caring about the document
>> > that holds the object? Same for an object property.
>> > * how often are you searching for an attachment without caring about
>> > the document that holds the attachment?
>> > * are you searching for a class or for the document that defines that
>> class?
>> > * are you searching for a wiki without caring about the documents it
>> > contains? Same for a space.
>> >
>> > IMO the result of a Solr query should be, semantically, a list of
>> > documents. But maybe I'm wrong.
>> >
>> > -----------------------
>> > Technical Details
>> > -----------------------
>> >
>> > Unlike a relational database, Solr/Lucene index has a single 'table'.
>> > So normally you index a single entity type. Each row in the index
>> > represents an entity of that type. As a consequence the result of a
>> > Solr query is semantically a list of entities of that type. In our
>> > case the entity type is (naturally) *document*.
>> >
>> > If you want to index more entity types (e.g. index attachments and
>> > objects _separately_, not as part of a document) then, since there is
>> > only one 'table' in the index, you need to add a 'type' column that
>> > specifies the type of entity you have on each row (e.g. type=document,
>> > type=attachment, type=object etc.). The result of a Solr query is now,
>> > semantically, a list of different entity types, unless you filter by a
>> > specific type. It smells like a hack to me.
>> >
>> > Let's imagine what happens if we want to search for blog posts that
>> > has a specific tag. With the first approach this is easy because all
>> > the (indexed) information is on a single row. With the second approach
>> > this is considerably more complex because the information is spread on
>> > multiple rows:
>> >
>> > * one row with type=document for the blog post document
>> > * one row with type=object for the blog post object
>> > * one row with type=object for the tab object
>> >
>> > In a relational database when you have the information spread in
>> > multiple places (tables) you do joins. Fortunately (you would says)
>> > Solr supports joins. In this particular case we would have to perform
>> > 2 joins which means:
>> >
>> > index X index X index
>> >
>> > where X represents the cartesian product. The document name would be
>> > the join key. Pretty complex even before trying to write this in Solr
>> > query syntax..
>> >
>> > So basically the question becomes: is it worth indexing more entities
>> > _separately_ instead of indexing just documents (with info about their
>> > objects and attachments) considering the complexity that it brings in
>> > writing Solr queries? Do we search for objects and attachments alone
>> > as separate entities often enough to justify this complexity? My
>> > answer is no.
>> >
>> > Thanks,
>> > Marius
>> _______________________________________________
>> devs mailing list
>> [email protected]
>> http://lists.xwiki.org/mailman/listinfo/devs
>>
>
>
>
> --
> Ludovic Dubost
> Founder and CEO
> Blog: http://blog.ludovic.org/
> XWiki: http://www.xwiki.com
> Skype: ldubost GTalk: ldubost
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Re: [xwiki-devs] [Solr] What do we search for?

Reply via email to