Hi Marius,

I have a quick question when starting reading your proposal. I don't see
anything about multi language indexing.
I remember in the current SOLR implementation that there are multiple
fields for each language. Would there be a fields for each language indexed
for each property ?

Ludovic


2013/10/14 Marius Dumitru Florea <[email protected]>

> I started writing
> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema . I need help
> with two things:
>
> * test cases
> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HTestCases
> * if time permits, review the proposal, especially
> http://dev.xwiki.org/xwiki/bin/view/Design/SolrSchema#HAMixedApproach
> .
>
> Thanks,
> Marius
>
>
> On Fri, Oct 11, 2013 at 12:55 PM, Marius Dumitru Florea
> <[email protected]> wrote:
> > Hi devs,
> >
> > This is a very important question so think carefully. Let me explain:
> >
> > In XWiki (model) we have a few entity types. There are *wikis* which
> > have *spaces* which have *documents*. A document can have *objects*
> > and *attachments*. A document can also define a *class*.
> >
> > At the same time we like to say that in XWiki "everything is a
> > document" because everything revolves around documents. The document
> > is the central notion.
> >
> > We can query the database (using HQL or XWQL) for any of the
> > previously mentioned entities but what should a Solr query return
> > (semantically)? In other words:
> >
> > * are you searching for an object without caring about the document
> > that holds the object? Same for an object property.
> > * how often are you searching for an attachment without caring about
> > the document that holds the attachment?
> > * are you searching for a class or for the document that defines that
> class?
> > * are you searching for a wiki without caring about the documents it
> > contains? Same for a space.
> >
> > IMO the result of a Solr query should be, semantically, a list of
> > documents. But maybe I'm wrong.
> >
> > -----------------------
> > Technical Details
> > -----------------------
> >
> > Unlike a relational database, Solr/Lucene index has a single 'table'.
> > So normally you index a single entity type. Each row in the index
> > represents an entity of that type. As a consequence the result of a
> > Solr query is semantically a list of entities of that type. In our
> > case the entity type is (naturally) *document*.
> >
> > If you want to index more entity types (e.g. index attachments and
> > objects _separately_, not as part of a document) then, since there is
> > only one 'table' in the index, you need to add a 'type' column that
> > specifies the type of entity you have on each row (e.g. type=document,
> > type=attachment, type=object etc.). The result of a Solr query is now,
> > semantically, a list of different entity types, unless you filter by a
> > specific type. It smells like a hack to me.
> >
> > Let's imagine what happens if we want to search for blog posts that
> > has a specific tag. With the first approach this is easy because all
> > the (indexed) information is on a single row. With the second approach
> > this is considerably more complex because the information is spread on
> > multiple rows:
> >
> > * one row with type=document for the blog post document
> > * one row with type=object for the blog post object
> > * one row with type=object for the tab object
> >
> > In a relational database when you have the information spread in
> > multiple places (tables) you do joins. Fortunately (you would says)
> > Solr supports joins. In this particular case we would have to perform
> > 2 joins which means:
> >
> > index X index X index
> >
> > where X represents the cartesian product. The document name would be
> > the join key. Pretty complex even before trying to write this in Solr
> > query syntax..
> >
> > So basically the question becomes: is it worth indexing more entities
> > _separately_ instead of indexing just documents (with info about their
> > objects and attachments) considering the complexity that it brings in
> > writing Solr queries? Do we search for objects and attachments alone
> > as separate entities often enough to justify this complexity? My
> > answer is no.
> >
> > Thanks,
> > Marius
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs
>



-- 
Ludovic Dubost
Founder and CEO
Blog: http://blog.ludovic.org/
XWiki: http://www.xwiki.com
Skype: ldubost GTalk: ldubost
_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to