Tasty Findory

Otis Gospodnetic Thu, 28 Aug 2008 13:22:09 -0700

I imagine everyone here is familiar with Findory.  If not, too bad for you! ;)
I loved Findory and used it on a daily basis until it shut down.


I'm wondering if there are lessons to be learned from what Greg Linden 
(Findory's papa - let's see if he has Google Ego Alert) wrote about Fiindory 
over the years:
  http://glinden.blogspot.com/search?q=findory

Here are a few excerpts (from the results of the above search) that seem 
important to me.  I am wondering if people could comment on how Taste compares:

1. Findory's personalization used a type of hybrid collaborative filtering
algorithm that recommended articles based on a combination of
similarity of content and articles that tended to interested other
Findory users with similar tastes.

One way to think of this is
that, when a person found and read an interesting article on Findory,
that article would be shared with any other Findory readers who likely
would be interested. Likewise, that person would benefit from
interesting articles other Findory readers found. All this sharing of
articles was done implicitly and anonymously without any effort from
readers by Findory's recommendation engine.

Findory's news
recommendations were unusual in that they were primarily based on user
behavior (what articles other readers had found), worked from very
little data (starting after a single click on Findory), worked in
real-time (changed immediately when someone read an article), required
no set-up or configuration (worked just by watching articles read), and
did not readers to identify themselves (no login necessary).


2. For most of Findory's four years, it ran on six servers.

Findory's six servers were all cheap commodity Linux boxes,
typically a single core low-end AMD processors, 1G of RAM, and a single
IDE disk. Findory was cheap, cheap, cheap.


3. So, when someone comes to your personalized site, you need to load
everything you need to know about them, find all the content that that
person might like, rank and layout that content, and serve up a pipin'
hot page. All while the customer is waiting.

Findory works hard
to do all that quickly, almost always in well under 100ms. Time is
money, after all, both in terms of customer satisfaction and the number
of servers Findory has to pay for.

The way Findory does this is
that it pre-computes as much of the expensive personalization as it
can. Much of the task of matching interests to content is moved to an
offline batch process. The online task of personalization, the part
while the user is waiting, is reduced to a few thousand data lookups.

Even
a few thousand database accesses could be prohibitive given the time
constraints. However, much of the content and pre-computed data is
effectively read-only data. Findory replicates the read-only data out
to its webservers, making these thousands of lookups lightning fast
local accesses.

Read-write data, such as each reader's history
on Findory, is in MySQL. MyISAM works well for this task since the data
is not critical and speed is more important than transaction support.

The
read-write user data in MySQL can be partitioned by user id, making the
database trivially scalable. The online personalization task scales
independently of the number of Findory users. Only the offline batch
process faced any issue of scaling as Findory grew, but that batch
process can be done in parallel. 

In the end, it is blazingly
fast. Readers receive fully personalized pages in under 100ms. As they
read new articles, the page changes immediately, no delay. It all just
works.


4. (quoting a Google paper from WWW2007 on Google news personalization)
The paper tested three methods of making news recommendations on the Google 
News front page.  From the abstract:
We
describe our approach to collaborative filtering for generating
personalized recommendations for users of Google News. We generate
recommendations using three approaches: collaborative filtering using
MinHash clustering, Probabilistic Latent Semantic Indexing (PLSI), and
covisitation counts.MinHash and PLSI are both
clustering methods; a user is matched to a cluster of similar users,
then they look at the aggregate behavior of users in that cluster to
find recommendations.

OG: I think this is similar in principal to what I mentioned in an earlier 
email.


 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

Tasty Findory

Reply via email to