"I" have an LSH implementation that might get itself contributed. I'm
in the middle of a desultory conversation with my colleagues about the
question of whether there is a good reason to retain it as a
closed-source item. I'm curious as to whether Mahout would be a
suitable home.

The implementation follows Petrovic. More to the point, I've worked
very hard to minimize its memory footprint so as to allow it to sit
there in memory indexing a very large collection of documents (of
course, actually, feature vectors). The scheme is that all the data
lives in live Java objects, and new items are also written to a log
(made from google protocol buffers; I now realize avro might have been
more to the point). There is a modularity than anticipates wanting to
run on a scale where it couldn't fit into memory any more.

Thus, it's natural shape seems to me to be a web service, not a
map-reduce thing. Are we interested as a project? Does this
description make any sense?

Reply via email to