Fabio, interesting document and challenging mission!
There's a whole lot to tell about your document, but here's a few guts feelings: - first, I think you should describe a few application scenarios in more details; I think you'd come with the conclusion that both an EmbeddedServer and a Solr WebApp (inside server or outside) make sense. It looks like this decision is not needed now as Solrj offers you abstraction. - I am fearing you do not get all the benefits with EmbeddedServer, in particular the caching and auto-warming but that seems not to be the case: http://lucene.472066.n3.nabble.com/Embedded-Server-Caching-Stats-page-updates-td827632.html - Your query examples are pretty hairy I find. Joe-bo users want to "just type" and find relevance ranked results. Solr supports this well with the DisMax query handler (it allows to put a higher rank on title for example, than on body, than on attachments...). I would say you need both (the solr web-app's default query handler allows both with an extra prefix). Another major advantage, which the lucene plugin missed is that you can have one field that is "stemmed" and a copy of it that is not. A match in the exact field would rank higher. - In all applications I've worked on, indexing pages when they change is not enough because they are pages that depend on others... this needs to be addressed at the application level (think, e.g. about the dashboard, about "book" pages that enclose others): re-index triggers. Another crucial aspect is to stimulate anyone working on a particular schema to be economic. The biggest flaw of the xwiki-lucene-module is that it indexed and stored everything... that meant that a single result document was quite big. Storing typically is probably not useful. - particular scenarios will have particular UIs. Would you sketch one that would be default for 3.2? Would authors be facets? spaces? - I would suggest to enter best practice as soon as possible: make evaluations possible per default. A typical evaluation would be run by a content expert that would know his documents and would invent a few queries (e.g. reading the logs) and check the correct or incorrect results, that'd give mean precision and recall at each of the results, something you can then collect and tabulate to assess the "mean" quality of a search engine (that paper: http://www.oracleimg.com/technetwork/database/enterprise-edition/imt-quality-092464.html explains this well). I'm just back from a summer school on Information Retrieval and there's a lot there. I am sorry I cannot offer much time but I would love to lend a little hand. paul Le 6 sept. 2011 à 17:29, Fabio Mancinelli a écrit : > Hi everybody, > > for the 3.2 release cycle I said that I was going to investigate a bit > the SOLR search engine and how to use/integrate it in the current > platform. > I wrote a document that you can find here: > http://dev.xwiki.org/xwiki/bin/view/Design/SOLRIntegration about some > of the things I looked at. > > There is a lot of room for discussion/improvement but I think the > document is already a good starting point. > > Feedback is welcome. > > Thanks, > Fabio > _______________________________________________ > devs mailing list > [email protected] > http://lists.xwiki.org/mailman/listinfo/devs _______________________________________________ devs mailing list [email protected] http://lists.xwiki.org/mailman/listinfo/devs

