Fabio,

interesting document and challenging mission!

There's a whole lot to tell about your document, but here's a few guts feelings:

- first, I think you should describe a few application scenarios in more 
details; I think you'd come with the conclusion that both an EmbeddedServer and 
a Solr WebApp (inside server or outside) make sense. It looks like this 
decision is not needed now as Solrj offers you abstraction.

- I am fearing you do not get all the benefits with EmbeddedServer, in 
particular the caching and auto-warming but that seems not to be the case:
  
http://lucene.472066.n3.nabble.com/Embedded-Server-Caching-Stats-page-updates-td827632.html

- Your query examples are pretty hairy I find. Joe-bo users want to "just type" 
and find relevance ranked results. Solr supports this well with the DisMax 
query handler (it allows to put a higher rank on title for example, than on 
body, than on attachments...). I would say you need both (the solr web-app's 
default query handler allows both with an extra prefix). Another major 
advantage, which the lucene plugin missed is that you can have one field that 
is "stemmed" and a copy of it that is not. A match in the exact field would 
rank higher.

- In all applications I've worked on, indexing pages when they change is not 
enough because they are pages that depend on others... this needs to be 
addressed at the application level (think, e.g. about the dashboard, about 
"book" pages that enclose others): re-index triggers.
Another crucial aspect is to stimulate anyone working on a particular schema to 
be economic. The biggest flaw of the xwiki-lucene-module is that it indexed and 
stored everything... that meant that a single result document was quite big. 
Storing typically is probably not useful.

- particular scenarios will have particular UIs. Would you sketch one that 
would be default for 3.2? Would authors be facets? spaces?

- I would suggest to enter best practice as soon as possible: make evaluations 
possible per default. A typical evaluation would be run by a content expert 
that would know his documents and would invent a few queries (e.g. reading the 
logs) and check the correct or incorrect results, that'd give mean precision 
and recall at each of the results, something you can then collect and tabulate 
to assess the "mean" quality of a search engine (that paper: 
http://www.oracleimg.com/technetwork/database/enterprise-edition/imt-quality-092464.html
 explains this well). I'm just back from a summer school on Information 
Retrieval and there's a lot there.

I am sorry I cannot offer much time but I would love to lend a little hand.

paul

Le 6 sept. 2011 à 17:29, Fabio Mancinelli a écrit :

> Hi everybody,
> 
> for the 3.2 release cycle I said that I was going to investigate a bit
> the SOLR search engine and how to use/integrate it in the current
> platform.
> I wrote a document that you can find here:
> http://dev.xwiki.org/xwiki/bin/view/Design/SOLRIntegration about some
> of the things I looked at.
> 
> There is a lot of room for discussion/improvement but I think the
> document is already a good starting point.
> 
> Feedback is welcome.
> 
> Thanks,
> Fabio
> _______________________________________________
> devs mailing list
> [email protected]
> http://lists.xwiki.org/mailman/listinfo/devs

_______________________________________________
devs mailing list
[email protected]
http://lists.xwiki.org/mailman/listinfo/devs

Reply via email to