Hi Stephane, I don't have much time now but I just wanted to let you know that IMHO your list of goals / tasks sounds completely reasonable, in case you need it I may be able to give some help along the next weeks.
Regards, Tommaso 2013/10/2 Stephane Gamard <[email protected]> > Hi Team, > > My name's Stephane and I am currently participating to the Fusepool FP7 > project. Within this project we are using stanbol & clerezza as key > architectural components. Coming from a pure FullText search and > Information Retrieval background I find myself in somewhat of a new > territory. > > But within that new territory there is a link to my area of expertise: > Lucene/Solr in the rdf.cris package. This package turns out to be crucial > for our project and I would gladly participate and contribute my knowledge > as a Lucene and Solr developer. So here in a nutshell a list of "small > contributions" to start with: > > - Abstraction Refactoring > Currently CRIS is using Lucene as its FT engine, but we might want to > eventually go to Solr (or elasticsearch for XYZ reasons). First step would > be to remove all Lucene dependencies in rdf.cris package and push > implementation in rdf.cris.lucene package > > - Lucene 4.x Branch > There are a large number of changes since the 2.x and 3.x branch of > Lucene. I'd propose a small refactor and overhaul of the rdf.cris.lucene > package to take advantage of Lucene's new features (Facets, SearchManager, > …) > > - Solr Implementation > In line with "in production" I strongly believe clerezza's CRIS component > should be able to leverage established services without having to manage > their scalability. That goes for FullText Search most obviously. The idea > is to be able to use a remote Solr Server (Solr since it comes with the > whole pseudo-rest servicing on top of Lucene). > > - Fine Grained Search > As a logical evolution from the points above, it would be then perfect if > clerezza's fulltext search capabilities could benefit from all the features > of Lucene/Solr. I am especially thinking about: > -- Field/Analyzer specialisation (we don't compare authors, dates and text > in the same way in Lucene/Solr) > -- Boosting (For IR, the title of a document usually yields more important > information than its footnotes) > -- Advanced facets (things like date-rage facets, pivot facets (called 2nd > level facets in fusepool)) > -- Geolocalised searches (big thing in Lucene/Solr 4.x branch… would > eventually be a nice to have) > > I will execute this work over the next few weeks/months as part of the > fusepool project, but most of all I would be pleased and interested to > finally get a top-notch implementation of cross rdf-text solution. Very > much looking forward for your feedback and hopefully support ;) > > PS: who ever initiated the GraphIndexer implementation did an excellent > job! Will hopefully follow in his footsteps! > > Cheers, > > _Stephane > >
