Re: LSH notes on text documents

Benson Margulies Tue, 26 Apr 2011 13:24:54 -0700

I have never claimed to be knowledgable of map-reduce. I'm hoping to
learn here if I ever get anything interesting done.


If the documents arrive serially, map-reduce is uninteresting to scale
across documents.

However, there are the multiple hash tables.

Now, for with parameters from Petrovic, for a large (1Mdoc) store, you
have ~72 tables. Yes, you could put 72 tables on 72 nodes: map sends
things to them, reduce collates the results.

I've never seen a hadoop 'thing' that has permanent in-memory state
like this. I'm not sure where memory mapping comes into the picture.

If folks are game, I'll poke the question of contribution some more.

Of course, this is as always a question about overall throughput
versus respond time. my thinking about this is colored by caring about
immediate response. Up to some scale, the money to buy a cluster will
buy a whole lot of cores and memory in one machine, and a whole lot of
very-low-overhead paralllelism as a result.

Re: LSH notes on text documents

Reply via email to