Actually I kind of misspoke. The fastest way to do this is in fact to
use a GenericUserBasedRecommender -- not because you need
recommendations, but because it exposes a nice mostSimlarUsers()
method.

DataModel model = new MySQLBooleanPrefJDBCDataModel(...); // or
whatever you are using
UserSimilarity similarity = BooleanTanimotoCoefficientSimilarity(model);
UserNeighborhood similarity = new NearestNUserNeighborhood(10,
similarity, model);
BooleanUserGenericUserBasedRecommender recommender = new
BooleanUserGenericUserBasedRecommender(model, neighborhood,
similarity);
Rescorer<Pair<User,User>> rescorer = new Rescorer<Pair<User,User>>() {
  // implement your rescoring logic to affect how similar 'users' are
-- boost the returned value for popular queries
};
Object userID = ...; // current query ID
List<User> similarUsers = recommender.mostSimilarUsers(userID, 10, rescorer);

On Mon, Jul 20, 2009 at 11:18 AM, Sean Owen<[email protected]> wrote:
> First, Lucene itself has related-search functionality. If you are using it,
> I suspect it will be better to leverage that. (Even if not it is possible to
> run Lucene just for this purpose.) Others much more familiar with Lucene can
> comment.
>
> I can comment on your current approach. Yes, it seems reasonable. You are
> effectively using just a piece of a user-based recommender, and that is the
> UserNeighborhood component. This is all you need, such as
> NearestNUserNeighborhood and your similarity metric.
>
> Scalability could be an issue, since you have a 'user' for every distinct
> query. Consider normalizing queries a lot - decapitalize, remove whitespace,
> sort by query term, keep only first n terms, etc.
>
> Also for this reason, consider using
> BooleanUserTanimotoCoefficientSimilarity and BooleanPrefUserDatModel (off
> the top of my hrad even I am not sure I got those names right!)  Because you
> do not have a reliable notion of strength of preference, you should ignore
> preference value, and these implementations let you take advantage of this.
>
> Please use the very latest code from Subversion (Mahout 0.2 SNAPSHOT). I am
> right now working with a client using these implementations and have been
> fixing and optimizing them a lot recently.
>
> Finally, you want to boost queries that are popular. You can use a Rescorer
> for this to inject any score changes you like. Perhaps you devise some
> function that increases the similarity towards 1 the more the two queries
> are observed.
>
> Let us start there and follow up with questions here.

Reply via email to