On Sat, Dec 5, 2009 at 4:27 PM, Sean Owen <[email protected]> wrote:

The biggest pain for me here is how to rationalize all of this into an
> API. The current code is completely online. Now I'm dropping in a
> truly offline/distributed version, which is a totally different
> ballgame. And then there are all these hybrid approaches, computing
> some stuff offline and some online and requiring real-time integration
> with HDFS.


This is something I've been thinking about a lot too - at LinkedIn we do a
ton of offline Hadoop-based computation for recommendations, but then a
bunch of stuff is/can be done online.  You can do it with Lucene, as Ted
suggests (and in fact that is one of our implementations), or by doing a
lot of precomputation of results and storing them in a key-value store
(in our case, Voldemort).

But having a nice api for *outputting* the precomputed matrices which
are pretty big into a format where online "queries"/recommendation
requests can be computed I think is really key here.   We should think
much more about what makes the most sense here.

  -jake

Reply via email to