I imagine everyone here is familiar with Findory. If not, too bad for you! ;) I loved Findory and used it on a daily basis until it shut down.
I'm wondering if there are lessons to be learned from what Greg Linden (Findory's papa - let's see if he has Google Ego Alert) wrote about Fiindory over the years: http://glinden.blogspot.com/search?q=findory Here are a few excerpts (from the results of the above search) that seem important to me. I am wondering if people could comment on how Taste compares: 1. Findory's personalization used a type of hybrid collaborative filtering algorithm that recommended articles based on a combination of similarity of content and articles that tended to interested other Findory users with similar tastes. One way to think of this is that, when a person found and read an interesting article on Findory, that article would be shared with any other Findory readers who likely would be interested. Likewise, that person would benefit from interesting articles other Findory readers found. All this sharing of articles was done implicitly and anonymously without any effort from readers by Findory's recommendation engine. Findory's news recommendations were unusual in that they were primarily based on user behavior (what articles other readers had found), worked from very little data (starting after a single click on Findory), worked in real-time (changed immediately when someone read an article), required no set-up or configuration (worked just by watching articles read), and did not readers to identify themselves (no login necessary). 2. For most of Findory's four years, it ran on six servers. Findory's six servers were all cheap commodity Linux boxes, typically a single core low-end AMD processors, 1G of RAM, and a single IDE disk. Findory was cheap, cheap, cheap. 3. So, when someone comes to your personalized site, you need to load everything you need to know about them, find all the content that that person might like, rank and layout that content, and serve up a pipin' hot page. All while the customer is waiting. Findory works hard to do all that quickly, almost always in well under 100ms. Time is money, after all, both in terms of customer satisfaction and the number of servers Findory has to pay for. The way Findory does this is that it pre-computes as much of the expensive personalization as it can. Much of the task of matching interests to content is moved to an offline batch process. The online task of personalization, the part while the user is waiting, is reduced to a few thousand data lookups. Even a few thousand database accesses could be prohibitive given the time constraints. However, much of the content and pre-computed data is effectively read-only data. Findory replicates the read-only data out to its webservers, making these thousands of lookups lightning fast local accesses. Read-write data, such as each reader's history on Findory, is in MySQL. MyISAM works well for this task since the data is not critical and speed is more important than transaction support. The read-write user data in MySQL can be partitioned by user id, making the database trivially scalable. The online personalization task scales independently of the number of Findory users. Only the offline batch process faced any issue of scaling as Findory grew, but that batch process can be done in parallel. In the end, it is blazingly fast. Readers receive fully personalized pages in under 100ms. As they read new articles, the page changes immediately, no delay. It all just works. 4. (quoting a Google paper from WWW2007 on Google news personalization) The paper tested three methods of making news recommendations on the Google News front page. From the abstract: We describe our approach to collaborative filtering for generating personalized recommendations for users of Google News. We generate recommendations using three approaches: collaborative filtering using MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and covisitation counts.MinHash and PLSI are both clustering methods; a user is matched to a cluster of similar users, then they look at the aggregate behavior of users in that cluster to find recommendations. OG: I think this is similar in principal to what I mentioned in an earlier email. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
