This has been implemented in open source, but not with lucene? http://www.cs.put.poznan.pl/dweiss/carrot/ and http://carrot2.sourceforge.net/ David Weiss is a Polish academic at Poznan University, Poland. He and others have implemented a servlet based web app that uses pipe lined components that communicate using http and implement a couple of clustering algorithms. Clustering, of course, can go way beyond search result presentation and there are some very suggestive examples at http://www.sics.se/humle/socialcomputing/ Where the encore project (Martin Svennson) is based on orthogonal transformations of a large sparse matrix (a possible method for matrix dimension reduction). I think it would be interesting to hook a recommender system into lucene, thus clustering would take place on the basis of user profile which may be built up automatically by accumulating clicks and comparing to other visitors, with some intelligent weighting to node inputs. This calls into question what really a search is, does it have to be instigated by the user or might their context and history suggest enough to pull in additional material? So this would be on top of snippets and also influence what snippets are returned as well as their presentation. Coller still would be to be able to recognise the user without a login. This might be implemented with cookies, but to deal with the user in terms of types of interests, a series of faceted profiles, so that portals could become fluidly dynamic. Sounds far flung, but I actually think it is just round the corner. Let me know if this is of interest.
Adam > -----Original Message----- > From: integer [daniel prawdzik] [mailto:[EMAIL PROTECTED] > Sent: Wednesday, January 26, 2005 5:17 PM > To: lucene-dev@jakarta.apache.org > Subject: -> Grouping Search Results by Clustering Snippets: > > Grouping Search Results by Clustering Snippets: > > The presentation of search engines are typically long unsorted lists of > results. To find the page you’re looking for, is often time-consuming > and unsatisfying. > Showing the results in groups by similar topics is a quite more > suitable solution to give an user a quick overview over the results. > This can be done by a technology called cluster analysis. Actually I’m > working on my diploma master thesis about this topic. In my > understanding, it’s too nice to be born for the archive, so I want to > implement this feature in an opensource software. The coding of this > programm already gone pretty far, I’ve got some tests done and the > results are impresive and might still get better [you can see some > results on http://www.trist.de/CV/Text-Mining/ -> sorry, only in german] > > To make a long story short: > I’m wondering, if this is an attractive feature for the lucene > community? > > regards, > integer > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]