I'm looking to enhance a product recommendation engine. It currently works with all data as a whole. I want to introduce clustering/grouping. Its model based and the relationship is the common User-Items relationship. Originally I was thinking of using a Canopy / kmeans cluster. And then determine which cluster a user is in and execute Item Similarity against only that cluster of items. However I can't figure out how to build a SequenceFile using vectors with the User/Items relationship. I don't know which data points to feed the vector. So I scratched that idea and turned my attention to Lucene, thinking that this is really a document issue. Where users are documents and the items are the content. I should be able to ask Lucene, give me documents that look like this "productId3 productId9056 productId234".
I'm looking for any and all feedback from those experienced in the recommendation world, specifically with the grouping of users and items. Thanks, -Jay
