You could do that. But then, the system would be recommending words to documents! Not quite what you want. I assume you still want to recommend documents to (real) users.
I would use other techniques to determine document similarity. Others on this list can suggest ideas, but, simple metrics based on word frequency should do well. Then, use that logic to create an implementation of ItemSimilarity. Then build a DataModel, perhaps a FileDataModel, maybe from a file containing user IDs, document IDs, and preference values. Then try a GenericItemBasedRecommender based on these components. We can discuss these more in detail later. Assuming you go this way, a couple thousand documents (and a couple thousand users?) should be no problem to process in memory. It should be fast. I would, perhaps, make sure that your ItemSimilarity caches results, or perhaps is based on pre-computed values, since that would be slow to re-compute those over and over a runtime. Sean On Apr 3, 2009 7:14 AM, "Vinicius Carvalho" <[email protected]> wrote: Hi there! I would like to build a document recommendation system, and one of the approaches I wish to experiment is use taste for that task. One idea I had was to model users as documents, words as items and word frequencies on documents as preferences. Am I going on the right direction here? Also, I'm a bit afraid about memory consumption here. So far we only have 6k documents (which may have a few hundred words per doc). But would taste scale to lets say 100k documents with few hundreds of words? Best regards -- The intuitive mind is a sacred gift and the rational mind is a faithful servant. We have created a society that honors the servant and has forgotten the gift.
